So today I worked on a project for CFWebTools. The site I was working on has products and categories. The categories have subcategories that can go 4 levels deep. When someone searches on the site for a term like “ipod”, right now all the results that include ipod are returned. On the results page they want a select box with all the top-level categories listed that have ipod with in that category (and it’s sub-categories) along with a count. Similar to how Amazon does it.
So in the select box you would have something like this:
Apple (10)
Games (5)
Electronics (2)
Now category Apple might have a sub-category of Ipod, Ipod might have a sub-category of Headphones, Headphones might have a sub-category of… well you get the idea. So the way things are set up right now, in order to get an accurate count of products containing the term i-pod within the Apple category and any of the sub-categories, I would have to do 5 nested queries. Not only would I have to find the count, but determine what the top-level category is on a sub-category that might be 4 levels deep. Ugh…
So I suggested I create a script that we can run when needed (like once a day since Categories are imported daily). The script will do the necessary nested query madness and label all the subcategories with a main_id, which is ultimately the id of the top-level category. They all have a parent_id but the parent is simply one level up, so if it is a 4th level sub category, the parent_id isn’t going to tell us what the top-level category is. By adding in a field of a main_id in the database we can create a much quicker count and even optimize some other areas of the websites.
So, I added one field to the database called main_id.
Then I run this script you see below (takes about 4 seconds). You will see several if statements that check to see if the record count is greater than 0. I had to do this because I ran into sub-categories that were orphaned so there ends up not being a parent. This alleviates that problem and lets the script run without error. Once this script runs, we also now know all of the sub-categories that are orphaned because their main_id will be null. * I do use cfqueryparams on all queries just removed some from the sample since the indenting was already insane.
<cfquery datasource=#application.ds# name="getcats" > update categories set main_id = null </cfquery> <cfquery datasource=#application.ds# name="getcats" > select Category,parent_id from categories order by parent_id,category </cfquery> <cfoutput query="getcats"> <cfif parent_id eq 0> <cfset parent = category> <cfelse> <cfquery datasource=#application.ds# name="checkparent"> select category,parent_id from categories where category = <cfqueryparam value="#getcats.parent_id#" cfsqltype="CF_SQL_INTEGER" /> </cfquery> <cfif checkparent.recordcount gt 0> <cfif checkparent.parent_id eq 0> <cfset parent=checkparent.category> <cfelse> <cfquery datasource=#application.ds# name="checkparent2"> select category,parent_id from categories where category = <cfqueryparam value="#checkparent.parent_id#" cfsqltype="CF_SQL_INTEGER" /> </cfquery> <cfif checkparent2.recordcount gt 0> <cfif checkparent2.parent_id eq 0> <cfset parent=checkparent2.category> <cfelse> <cfquery datasource=#application.ds# name="checkparent3"> select category,parent_id from categories where category = <cfqueryparam value="#checkparent2.parent_id#" cfsqltype="CF_SQL_INTEGER" /> </cfquery> <cfif checkparent3.recordcount gt 0> <cfif checkparent3.parent_id eq 0> <cfset parent=checkparent3.category> <cfelse> <cfquery datasource=#application.ds# name="checkparent4"> select category,parent_id from categories where category = <cfqueryparam value="#checkparent3.parent_id#" cfsqltype="CF_SQL_INTEGER" /> </cfquery> <cfif checkparent4.recordcount gt 0> <cfif checkparent4.parent_id eq 0> <cfset parent=checkparent4.category> <cfelse> <cfquery datasource=#application.ds# name="checkparent5"> select category,parent_id from categories where category = <cfqueryparam value="#checkparent4.parent_id#" cfsqltype="CF_SQL_INTEGER" /> </cfquery> <cfif checkparent5.recordcount gt 0> <cfif checkparent5.parent_id eq 0> <cfset parent=checkparent5.category> </cfif> <cfelse> <cfset parent = 0> </cfif> </cfif> <cfelse> <cfset parent = 0> </cfif> </cfif> <cfelse> <cfset parent = 0> </cfif> </cfif> <cfelse> <cfset parent = 0> </cfif> </cfif> <cfelse> <cfset parent = 0> </cfif> </cfif> <cfif parent neq 0> <cfquery datasource="#application.ds#"> update categories set main_id = <cfqueryparam value="#parent#" cfsqltype="CF_SQL_INTEGER" /> where category = <cfqueryparam value="#getcats.category#" cfsqltype="CF_SQL_INTEGER" /> </cfquery> </cfif> </cfoutput>
So now I can create one query that will pull a count of a given search term from all sub-categories within a top-level category and come back with one count per top-level category. No nesting of queries. Runs much faster on the live site then the nesting would.
SELECT count(cat.main_id) as count, cat.main_id, cat2.name FROM products p INNER JOIN table_b b on p.id = b.prod_id INNER JOIN table_c c on c.categoryid = b.custcat_id INNER JOIN categories cat on c.categoryid = cat.categoryid INNER JOIN categories cat2 on cat2.main_id = cat2.categoryid WHERE p.prodname LIKE <cfqueryparam cfsqltype="cf_sql_varchar" value="%#trim(myTerm[ter])#%" /> GROUP BY cat.main_id, cat2.name