Find Everything You're Looking For–
Integrating Full-Text Search and Windchill 8.0

Leading manufacturing companies have made it a corporate priority to reduce product costs, increase engineering productivity, and speed time to market through parts reuse. But PDM systems—designed for version and revision control—often don't support reuse search or knowledge-based methodology, making parts libraries difficult to use.  

Full-text searching helps to solve this problem, allowing documents and parts to be indexed and then searched by their attributes and the contents of their occurrences. This means that users who need to find something quickly in ProjectLink or PDMLink can simply type the name of what they are looking for and immediately see a list of alternative documents, parts, and assemblies.

In this article we discuss how to integrate Apache Lucene, a full-featured text search engine library written entirely in Java, with ProjectLink and PDMLink. This open-source technology is suitable for nearly any application that requires full-text search, especially cross-platform. See www.apache.org for free download.

Using Digester to Parse an XML File

To get started, we use Commons Digester to parse a simple XML file. Digester provides a high-level interface for mapping XML documents to Java objects and can be customized to Windchill WT objects. {Digester requires a few additional Java libraries, including an XML parser compatible with either SAX 2.0 or JAXP 1.1.)

•  XML Example of an Assembly (cvs or xml files imported from PDMLink)

The box below contains entries about WTDocuments. To demonstrate handling of elements with and without attributes, this example includes an attribute of the <part> element and leaves all other elements without any attributes.

<?xml version='1.0' encoding='utf-8'?>

<assembly-part>

    <part type="part1">

        <attribute_1>ATTRIBUTE_1</attribute_1>

        <attribute_2>ATTRIBUTE_2</attribute_2>

        <attribute_3>ATTRIBUTE_3</attribute_3>

      ...

        <attribute_n>ATTRIBUTE_N</attribute_n>

    </part>

    <part type="part2">

        <attribute_1>ATTRIBUTE_1</attribute_1>

        <attribute_2>ATTRIBUTE_2</attribute_2>

        <attribute_3>ATTRIBUTE_3</attribute_3>

     ...

        <attribute_n>ATTRIBUTE_N</attribute_n>

    </part>

</assembly-part>

•  Parse XML Data

Using Digester to parse the XML document is very simple. The most involved part is centralized in the main() method. The first rule tells Digester to create an instance of the WTAssemblyParser class when the pattern <assembly-part> is found. Because <assembly-part> is the first element in the XML file, this rule will be the first to be triggered when we use Digester with our XML file.

digester.addObjectCreate("assembly-part", WTAssemblyParser.class);

•  Add Object

The next rule instructs Digester to create an instance of class Part (WTPart can be created as an extended object) when it finds the <part> child element under the <assembly-part> parent.

digester.addObjectCreate("assembly-part/part", Part.class);

 

•  Set Property

Then we set the type property of the part instance when Digester finds the type attribute of the <part> element.

digester.addSetProperties("assembly-part/part", "type", "type");

 

•  Define Rules

Our WTAssembly Parser class contains several rules that look similar to the one shown below. They instruct Digester to invoke the setAttribute_1() method of the part class instance and use the value enclosed by <attribute_1> elements as the method parameter.

digester.addCallMethod("assembly-part/part/name", "setAttribute_1", 0);

 

•  Add Rules

This rule tells Digester to call the addPart() method when it finds the closing </part> element.

digester.addSetNext("assembly-part/part", "addPart");

 

Creating Lucene Indices

Now we can use Lucene and create indices for WTObjects. There are four fundamental Lucene classes for indexing text: (1) IndexWriter, (2) Analyzer, (3) Document, and (4) Field . The IndexWriter class creates new indexes. First it is passed through Analyzer classes, which are in charge of extracting indexable tokens out of text. Lucene comes with a few different Analyzer implementations.

We create our own WTPartAnalyzer or WTDocumentAnalyzer for certain things. For example, some of them deal with skipping stop words (frequently used words that don't help distinguish one document from the other, such as a, an, the, in, and on ), while others deal with converting all tokens to lowercase letters, so that searches are not case-sensitive. An index consists of a set of documents, and each document consists of one or more fields.

•  Lucene-based WTParts indexer

Let's consider a simple scenario in which we add a single contact entry with all its fields to the index.

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.WhitespaceAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

/**

  * <code>WTAssemblyIndexer</code> class provides a simple

  * example of indexing with Lucene.   It creates a fresh

  * index called "assembly-part" in a temporary directory every

  * time it is invoked and adds a single document with a

  * few fields to it.

  */

public class WTAssemblyIndexer

{

    // initialization loggin system

    public static void main(String args[]) throws Exception

    {

       

        String indexDir =

            System.getProperty("java.io.tmpdir", "temp") +

            System.getProperty("file.separator") + " assembly-part";

        Analyzer analyzer = new WhitespaceAnalyzer();

        boolean createFlag = true;

        IndexWriter writer = new IndexWriter(indexDir, analyzer, createFlag);

        Document partDocument   = new Document();

        partDocument.add(Field.Text("ATTRIBUTE_1", "attribute_value_1"));

        partDocument.add(Field.Text("ATTRIBUTE_2", "attribute_value_2"));

        partDocument.add(Field.Text("ATTTIBUTE_3", "attribute_value_3"));

          ...

          ...

        partDocument.add(Field.Text("ATTRIBUTE_N", "attribute_n"));

        writer.addDocument(partDocument);

        writer.close();

    }

}

The first parameter in IndexWriter 's constructor specifies the directory where the index should be stored. The second parameter provides the implementation of Analyzer that should be used for preprocessing the text. The particular implementation of Analyzer used here employs the whitespace character as the delimiter for tokenizing the input. The last parameter is a boolean flag that, when true , tells IndexWriter to create a new index in the specified directory or to overwrite any existing index in that directory. A value of false tells IndexWriter to add Document s to an existing index instead.

We then create a blank Document and add several Text Field s to it. After the Document is populated, we add it to the index through the instance of IndexWriter . Finally, we close the index. Closing the IndexWriter is important to ensures that all index changes are saved to the disk.

Integrating the Digester and Lucene Tools

Now that you know how to use these tools on their own, we can combine the two classes we've written. We'll use Digester to handle XML parsing and Lucene to handle indexing. Let's look at some selections from this class in more detail. Just as we did in the AddressBookIndexer class, we need to open the Lucene index for writing using IndexWriter. We pass in the path to the index directory, the Analyzer to process all data being indexed, and a createFlag that is set to true , so that the index is opened in the append mode.

1. Open the Index for Writing

// indexing folder – here are located indexes

String indexDir =

    System.getProperty("java.io.tmpdir", "temp") +

    System.getProperty("file.separator") + "assembly-part";

Analyzer analyzer = new WhitespaceAnalyzer();

boolean flag = true;

// IndexWriter to use for adding contacts to the index

writer = new IndexWriter(indexDir, analyzer, flag);

 

•  Use addPart(Part) Method to Add the Document to the Index

The modified addPart(Part) method shown below now creates a fresh instance of the Lucene Document every time it is called. After the Document is populated with data from the Part instance that is passed into the method, it is added to the index through an instance of IndexWriter .

Document wtDocument   = new Document();

/*

  * We can provide additional interface to Lucene Document extended from

  * WTDocument

  */

wtDocument.add(Field.Text("ATTRIBUTE_1", part.getAttribute_1()));

wtDocument.add(Field.Text("ATTRIBUTE_2", part.getAttribute_2()));

wtDocument.add(Field.Text("ATTRIBUTE_3", part.getAttribute_3()));

...

wtDocument.add(Field.Text("ATTRIBUTE_N", part.getAttribute_N()));

writer.addDocument(wtDocument);

 

•  Optimize and Close the Index

Finally, at the end of the main() method, the index is optimized and closed to ensure that all Parts added are indeed written to the index on the disk.

// optimize and close the index

  writer.optimize();

  writer.close();

Using Lucene Classes to Search Text

Now we can create a Lucene index from an assembly containing parts entries encoded in XML. Here, we run a query that looks for all contents that contain the keyword "Zane" in the field called name .

1. Search the Assembly Index Created with the Lucene Indexer

// org.apache.lucene

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.TermQuery;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.Hits;

// java

import java.io.IOException;

/**

  * <code>WTAssemblySearcher</code> class provides a simple

  * example of searching with Lucene.     The index being searched

  * is called "assembly-part", located in a appropriate directory.

  * But in this case we store in temp folder.

  *

  */

public class WTAssemblySearcher

{

    public static void main(String[] args) throws IOException

    {

        /*

         * Initialization of logging system

         */

         log = .....

        /*

         * Defining index folder

         */

        String indexDir =

            System.getProperty("java.io.tmpdir", "temp") +

            System.getProperty("file.separator") + "assembly-part";

        /*

         * Define searcher (you can use your own searcher based class)

         */

        IndexSearcher searcher = new IndexSearcher(indexDir);

        /*

         * Define lucene query in simple mode using Query

         * in real world should be imbedded with WTQuery and PDMLink  

         * search framework see bellow

         */

        Query query = new TermQuery(new Term("name", "Zane"));

        Hits hits = searcher.search(query);

        System.out.println("Number of matching parts: " + hits.length());

              log.info(“Number of matching parts: “ + hits.length() )

        for (int i = 0; i < hits.length(); i++)

        {

            System.out.println("Attribute: "+hits.doc(i).get("attribute"));

        }

    }

}

The IndexSearcher class is used for accessing an existing index. The argument passed to its constructor is the path to the directory where the index is stored. Lucene provides a few different query types, the simplest being TermQuery. The call to IndexSearcher 's search(Query) method executes the search against the index and returns a collection of matching Document s in an instance of Hits .

For more advanced search design, you can use other Lucene capabilities and integrate them with the Windchill API (part of it is already done in PDMLink and ProjectLink). For instance, you can use several different types of queries with Lucene—boolean queries, phrase queries, wild-card queries, and so on. Lucene also lets you search multiple indices at once, as well as search indices located on remote computers (distributed search). Another useful feature is Lucene's QueryParser , which supports a powerful and user-friendly query syntax that can be integrated into Windchill WTQuery framework.

2. QSearchCustomIndexer Method

import com.search.indexer.QSearchCustomIndexer;

...

import org.apache.lucene.document.Document;

//##begin user.imports preserve=yes

import org.apache.lucene.index.Term;

import org.apache.lucene.document.Field;

//##end user.imports

//##begin QSearchCustomIndexer%1328674.doc preserve=no

//##end QSearchCustomIndexer%1328674.doc

public abstract class QSearchCustomIndexer implements SearchIndexer, Externalizable {

   // --- Attribute Section ---

   private static final String RESOURCE = "com.search.indexer.indexerResource";

   private static final String CLASSNAME = QSearchCustomIndexer.class.getName();

   static final long serialVersionUID = 1;

   ...

   // WARNING: Fields placed in this section will not be generated into externalization methods.

   //##begin user.attributes preserve=yes

   //##end user.attributes

   //##begin static.initialization preserve=yes

   //##end static.initialization

   ...

   //##begin getDocumentToIndex%41239D2D0371g.doc preserve=no

   /**

    * Retrieve a Lucene Document to index based on the given object.

    *

    * This method creates a new Document, calls the getUniqueTerm() method,

    * and add the unique Term as a Field in the Document.

    *

    * @param      toIndex   Object on which to base the Document

    * @return     Document

    * @exception com.search.exception.QSearchException

    **/

   //##end getDocumentToIndex%1329C230361g.doc

   public Document getDocumentToIndex( Object toIndex )

            throws QSearchException {

      //##begin getDocumentToIndex%1329C230361g.body preserve=yes

        // create a new Document since the QSearchCustomIndexer is Abstract

        Document document = new Document();

        // all objects add to the Index

        document.add( new Field( "className",toIndex.getClass().getName(),true,true,true) );

        Term uniqueTerm = getUniqueTerm(toIndex);

        if (uniqueTerm != null) {

            document.add( new Field(uniqueTerm.field(), uniqueTerm.text(), true, true, false) );

        }

        return document;

      //##end getDocumentToIndex%1329C230361g .body

   }

   //##begin user.operations preserve=yes

   //##end user.operations

 

Dmitry Tkach is software architect at EjinZ Inc. (www.ejinz.com).

Using ModelCHECK to Customize Start Files

Putting Science Education FIRST

In the Studio with Pro/ENGINEER

Shift from Physical to Virtual Prototyping

World Event 2007 Recap

Integrating Full-Text Search and Windchill 8.0

Developing Custom J-Link Applications

Go Interactive with Pro/INTRALINK Scripts

Creative
Capturing—
Converting Ideas to Parts

A Quick End to Duplicate Naming Problems

Using Trail Files to Save the Day