Custom Search

Selamat Datang

Friday, May 21, 2010

Using Solr Search With RDBMS

In many shops some of the most common queries used in large scale RDBMS systems such as Oracle are for pattern searches within ranges of criteria, typically targeted searches for data by users to answer and meet certain business needs. Writing standardized reports or simple relational queries can answer the questions, but such mechanisms can be inflexible and costly to maintain. One more efficient way to address these challenges is through the power of Solr.
Getting the Data
After installing Solr onto a standalone server outside of the production complex, the next step involved actually configuring Solr so that we could get the data we needed. A few decisions were made at this point. The first decision happened to be about the data itself. I decided to target many of the existing information structures within the application which had been simplified to meet other business reporting needs. Additionally by using these structures it would make configuration easier later on. The second decision involved whether to store the data values in the index itself. While ideally the data would have been accessed from the production database instance, I decided instead to store the data within the index for easier retrieval and to reduce the queries against the production database instance itself. The final decision involved how much of the data could be safely retrieved via the DataImportHandler and stored within Solr. This actually turned out to be pretty simple. The Oracle constructs only held a week work's of data, per an agreement with the business users. I would start with that amount and from there determine how much further could be held within the Solr instance.
Searching with Solr
The data once imported was not very large, only 50GB worth of data overall. This again could be managed by adjusting the field types, whether data had to be stored or not, and the amount of historical information to be imported. Now that the data was available, searches could be executed on the data.
I also found the packaged Schema Browser was very handy. Admittedly, the Schema Browser takes a while to process all the fields in the index so if you have a lot of data this can take a while. However the benefit is that it can provide answers to some of the more common questions that could be asked such as: the number of documents per value which can help for groups of items such as types of orders; how many documents actually have parent accounts; how orders are provided by various sending systems;how many orders are for a given state or postal code; etc. The data can also yield additional insights from more advanced searches such as faceted searches, such as what postal codes are responding to which advertising or product promotions; which areas have the most activity for certain types of orders; or, how many domains are covered per type of account. And the list goes on.
Operationally speaking, the Solr instances were managed in one of two ways: periodic updates from the main production instances or continual updates with application code not only adding data to the Oracle database but inserting them into the Solr index as well. Hence the operations against the existing production instances could be managed to minimize impacts and eliminate any unnecessary processing.
Conclusion
With these new capabilities, answers to key questions can be found in seconds. Data can be mined quickly, efficiently and flexibly without a lot of specialized training for business users. Additionally, the indexes could be managed in such a way such that additional data could be added for to increase the scope of analysis, or subsets of data could be indexed and searched for specific business reasons such as service outages or legal reasons.

No comments:

Post a Comment