So today I worked on a project for CFWebTools. The site I was working on has products and categories. The categories have subcategories that can go 4 levels deep. When someone searches on the site for a term like “ipod”, right now all the results that include ipod are returned. On the results page they want a select box with all the top-level categories listed that have ipod with in that category (and it’s sub-categories) along with a count. Similar to how Amazon does it.
So in the select box you would have something like this:
Apple (10)
Games (5)
Electronics (2)
Now category Apple might have a sub-category of Ipod, Ipod might have a sub-category of Headphones, Headphones might have a sub-category of… well you get the idea. So the way things are set up right now, in order to get an accurate count of products containing the term i-pod within the Apple category and any of the sub-categories, I would have to do 5 nested queries. Not only would I have to find the count, but determine what the top-level category is on a sub-category that might be 4 levels deep. Ugh…
So I suggested I create a script that we can run when needed (like once a day since Categories are imported daily). The script will do the necessary nested query madness and label all the subcategories with a main_id, which is ultimately the id of the top-level category. They all have a parent_id but the parent is simply one level up, so if it is a 4th level sub category, the parent_id isn’t going to tell us what the top-level category is. By adding in a field of a main_id in the database we can create a much quicker count and even optimize some other areas of the websites.
So, I added one field to the database called main_id.
Then I run this script you see below (takes about 4 seconds). You will see several if statements that check to see if the record count is greater than 0. I had to do this because I ran into sub-categories that were orphaned so there ends up not being a parent. This alleviates that problem and lets the script run without error. Once this script runs, we also now know all of the sub-categories that are orphaned because their main_id will be null. * I do use cfqueryparams on all queries just removed some from the sample since the indenting was already insane.
<cfquery datasource=#application.ds# name="getcats" >
update categories set main_id = null
</cfquery>
<cfquery datasource=#application.ds# name="getcats" >
select Category,parent_id from categories order by parent_id,category
</cfquery>
<cfoutput query="getcats">
<cfif parent_id eq 0>
<cfset parent = category>
<cfelse>
<cfquery datasource=#application.ds# name="checkparent">
select category,parent_id from categories where category = <cfqueryparam value="#getcats.parent_id#" cfsqltype="CF_SQL_INTEGER" />
</cfquery>
<cfif checkparent.recordcount gt 0>
<cfif checkparent.parent_id eq 0>
<cfset parent=checkparent.category>
<cfelse>
<cfquery datasource=#application.ds# name="checkparent2">
select category,parent_id from categories where category = <cfqueryparam value="#checkparent.parent_id#" cfsqltype="CF_SQL_INTEGER" />
</cfquery>
<cfif checkparent2.recordcount gt 0>
<cfif checkparent2.parent_id eq 0>
<cfset parent=checkparent2.category>
<cfelse>
<cfquery datasource=#application.ds# name="checkparent3">
select category,parent_id from categories where category = <cfqueryparam value="#checkparent2.parent_id#" cfsqltype="CF_SQL_INTEGER" />
</cfquery>
<cfif checkparent3.recordcount gt 0>
<cfif checkparent3.parent_id eq 0>
<cfset parent=checkparent3.category>
<cfelse>
<cfquery datasource=#application.ds# name="checkparent4">
select category,parent_id from categories where category = <cfqueryparam value="#checkparent3.parent_id#" cfsqltype="CF_SQL_INTEGER" />
</cfquery>
<cfif checkparent4.recordcount gt 0>
<cfif checkparent4.parent_id eq 0>
<cfset parent=checkparent4.category>
<cfelse>
<cfquery datasource=#application.ds# name="checkparent5">
select category,parent_id from categories where category = <cfqueryparam value="#checkparent4.parent_id#" cfsqltype="CF_SQL_INTEGER" />
</cfquery>
<cfif checkparent5.recordcount gt 0>
<cfif checkparent5.parent_id eq 0>
<cfset parent=checkparent5.category>
</cfif>
<cfelse>
<cfset parent = 0>
</cfif>
</cfif>
<cfelse>
<cfset parent = 0>
</cfif>
</cfif>
<cfelse>
<cfset parent = 0>
</cfif>
</cfif>
<cfelse>
<cfset parent = 0>
</cfif>
</cfif>
<cfelse>
<cfset parent = 0>
</cfif>
</cfif>
<cfif parent neq 0>
<cfquery datasource="#application.ds#">
update categories
set main_id = <cfqueryparam value="#parent#" cfsqltype="CF_SQL_INTEGER" />
where category = <cfqueryparam value="#getcats.category#" cfsqltype="CF_SQL_INTEGER" />
</cfquery>
</cfif>
</cfoutput>
So now I can create one query that will pull a count of a given search term from all sub-categories within a top-level category and come back with one count per top-level category. No nesting of queries. Runs much faster on the live site then the nesting would.
SELECT
count(cat.main_id) as count, cat.main_id, cat2.name
FROM
products p
INNER JOIN
table_b b on p.id = b.prod_id
INNER JOIN
table_c c on c.categoryid = b.custcat_id
INNER JOIN
categories cat on c.categoryid = cat.categoryid
INNER JOIN
categories cat2 on cat2.main_id = cat2.categoryid
WHERE
p.prodname LIKE <cfqueryparam cfsqltype="cf_sql_varchar" value="%#trim(myTerm[ter])#%" />
GROUP BY
cat.main_id, cat2.name