I am using a modified jQuery Thesaurus for some project and I want to explain my solution for the case we have a huge terms database.
The original code provides the following functions:
1. checks the content of a HTML node in a webpage for terms against a dictionary
2. marks the found terms and construct links for AJAX calls to the terms definitions
3. on mouseover constructs a tooltip on the fly and populates it with the term definition
My modified version is using a JSON feed instead of the default DB controller, but this is not the subject to discuss in this article.
The js code waits for page to complete then downloads (or loads from DB) the full list of terms as a javascript object. At this moment, if the database has a big number of terms the speed of execution decreases until the tool becomes unusuable. There are reports that over 400-500 terms the solution is already out of question.
Here I want to explain my solution to this problem. I decided that any webpage content should be much more smaller than a list of terms from a database with several thousand of entries (or even 130k entries as mentioned in the above report). In that case it makes sense to pass the text to the DB controller then filter the list of terms only to the terms that actually exists in the target webpage.
Therefore I have modified the project to handle this request: (code was updated to address the issue mentioned in the first commentary)
1. Change this function as follows
/** * 1) Loads list of terms from server * 2) Searches terms in DOM * 3) Marks up found terms * 4) Binds eventhandlers to them */ bootstrap : function() { var words; var list; $.each(this.options.containers, $.proxy(function(i, node) { words += " " + $(node).text().replace(/[\d!;:<>.=\-_`~@*?,%\"\'\\(\\)\\{\\}]/g, ' ').replace(/\s+/g, ' '); list = words.split(" "); list = removeDuplicates(list); words = list.join(" "); }, this)); $.getJSON(this.options.JSON_TERMS_URI+'&words='+words+'&callback=?', $.proxy(function(data){ this.terms = this._processResponseTerm(data); $.each(this.options.containers, $.proxy(function(i, node) { this._searchTermsInDOM(node); this._markup(node); }, this)); this.bindUI('body'); }, this)); },
You can see I am accessing my JSON feed instead of the DB controller but this is not an issue, the idea remains the same. I am passing the extracted text from the containers declared in the Thesaurus Options.
2. Filter the terms in the DB controller (syntax is for generating a JSON feed)
$tgs = $this->get_thesaurus(); //make an array of all terms in the dictionary $words = $_GET['words']; //load list of unique words from target $tags = array(); foreach ($tgs as $tag) { $list = explode(" ",$tag); //make list of words from each term foreach ($list as $word) { if (stristr($words, $word)) { //check if any of the words are present at target $tags[] = $tag; break; } } } return array( //return JSON 'count' => count($tags), 'tags' => $tags );
By using this method the size of the dictionary terms loaded in the javascript object falls back to a small number and the speed of the solution is not anymore compromised. It is true that for webpages with massive content the list of words cannot be sent to the server, but for most of the cases this solution will work well.
Leave a Reply