Latest Entries »

Document Library Part 1

So I’m working on a personal project, a document library.  Myself and several other people have tens of thousands of documents that we would like to archive online.  So I started thinking about how I could build this.  Basically I want a person to upload a document (pdf, text, etc.) and fill in info about it and then TAG it with anything relevant to the document such as people’s names, locations, etc.

So the first step for me, of course, was how to lay  the database out.   I started with the documents table which contains the following fields:

Doc_ID (primary key)

Title (title of the document)

Description (A semi-long description of the document)

Abstract (if the document is only text vs. a pdf, for example, that text would go here.)

filename (the name of the file uploaded)

filetype (the file type of the document)

document_date (the date of the actual document)

timestamp (date/time when the document was uploaded)

User_ID (id of the user who uploaded the document)

 

Then I figured I would need a user table

User_ID

Name

Email

Password

timestamp

 

Now onto the tags.   There were several things to consider here. I’d like to use ajax in the site to auto suggest tags to try to keep people from misspelling etc.  However it is inevitable that people will still enter tags incorrectly.  To maintain the integrity of the tagging system I want to make sure if a “new” tag is entered that it is marked as new so it can be reviewed and then either approved or corrected with the appropriate tag.   So first I need a Tags table:

Very simple:

Tag_ID (Primary key)

Tag (the actual tag)

Approved (bit column 0/1 so when a new tag comes in it’s marked w/ a 0 until approved)

 

Next is the document tags table that links the document up with the tag.

Doc_ID

Tag_ID

 

In the next post I will go over the ColdFusion portion of entering the document and tags.   Ultimately with it set up this way, it will create a way to click on a Tag and pull up all other documents that are related to that Tag, which is the ultimate goal here.

Forms in Email

Working on a customer site for CF Webtools.  They have an HTML email that goes out to people who fill out a form on their site. In the email they have a form with a bunch of hidden fields, if you click the Submit button it takes you to a page and creates a printable version of an application.

Their request was to create a text version of this email since some users did not have email programs that supported HTML. I used some suggestions that Wil posted here on creating the text version.  I tested the text version email and it worked great.

I decided to test the HTML version as well, it appeared to be fine, however each time I clicked the submit button it took me to an error page.  I looked into why the error was being thrown and apparently the form variables were not being passed. I use Thunderbird, from what I can tell Thunderbird will not pass form variables within an email to your browser. I’m sure there are other email clients that also have this problem.  I suggested we just nix the “form” all together, since the user isn’t actually filling anything out in the email.  So I removed the form and replaced it with an embedded link.

Moral of the story, don’t use forms in email unless you want to exclude some of your audience. :)

The Category Saga Continues

Ok so my last post was all about creating a script that would loop through and find all sub, sub-sub, sub-sub-sub  categories and label them with a main_id which was ulimately the top-level catgory of the tree, rather than the parent.

Now I have to create a script that drills up instead of down.  I have to start with the lowest level sub-category and see if it contained products, if it does not, I mark it as having no products.  I want categories that have no products NOT to display.  This is where it gets tricky. Only the lowest level sub categories contain products. So if I have

Category A
  Sub Category A
     Sub-Sub Category AA
     Sub-Sub Category AB
  Sub Category B
     Sub-Sub Category BA
     Sub-Sub Category BB

Only Sub-subCategories AA, AB, BA, and BB would have Products since they are the lowest level categories. So, if Sub-Sub AA has no products then when viewing the categories on the front-end of the website, Sub A  should only display Category Sub-Sub AB. However, let’s say Sub-Sub BA and Sub-Sub BB have no products, then ultimately neither does Sub B so all 3 of those categories would have to be marked as having no products and when viewing Category A on the front end of the site, Sub B would not display So this gets a little tricky and requires drilling up from the lowest level category to the parent and marking the appropriate categories.  Basically if I find a product in a subcategory, I mark that category has having products as well as the parent.

So here is the script I created to do this, this script is run once a day when the categories are imported and then I just filter categories using the hasproducts field:


<cfquery datasource="#application.ds#" name="getcats">  
   select  distinct a.category, a.parent_id
   from  products b
   inner join  tablex x on  b.id = x.prod_id
   inner join categories a on a.category = x.category
  where a.main_id is not null
  
 UNION
 
 select  distinct a.category, a.parent_id
 from  categories a
 inner join tableb b  on   a.category = b.category
 Where  a.main_id is not null
</cfquery>

<cfloop query="getcats">
 <cfquery datasource="#application.ds#" name="update">
   update categories
   set hasproducts = <cfqueryparam value="1" cfsqltype="CF_SQL_INTEGER" />
   where category =  <cfqueryparam value="#getcats.category#" cfsqltype="CF_SQL_INTEGER" />
  <cfif getcats.parent_id neq 0>
    OR category =   <cfqueryparam value="#getcats.parent_id#" cfsqltype="CF_SQL_INTEGER" />
  </cfif> 
 </cfquery>
 <cfif getcats.parent_id neq 0>
  <cfquery datasource="#application.ds#" name="getparent">
    select parent_id
    from categories
    where category = <cfqueryparam value="#getcats.parent_id#" cfsqltype="CF_SQL_INTEGER" />
  </cfquery>
  <cfif getparent.parent_id neq 0>
   <cfquery datasource="#application.ds#" name="update">
     update categories
     set hasproducts = <cfqueryparam value="1" cfsqltype="CF_SQL_INTEGER" />
     where category = <cfqueryparam value="#getparent.parent_id#" cfsqltype="CF_SQL_INTEGER" />
   </cfquery>
   <cfquery datasource="#application.ds#" name="getparent2">
     select parent_id
     from categories
     where category = <cfqueryparam value="#getparent.parent_id#" cfsqltype="CF_SQL_INTEGER" />
   </cfquery>
   <cfif getparent2.parent_id neq 0>
    <cfquery datasource="#application.ds#" name="update">
      update categories
      set hasproducts = <cfqueryparam value="1" cfsqltype="CF_SQL_INTEGER" />
      where category = <cfqueryparam value="#getparent2.parent_id#" cfsqltype="CF_SQL_INTEGER" />
    </cfquery>
    <cfquery datasource="#application.ds#" name="getparent3">
      select parent_id
      from categories
      where category = <cfqueryparam value="#getparent2.parent_id#" cfsqltype="CF_SQL_INTEGER" />
    </cfquery>
    <cfif getparent3.parent_id neq 0>
     <cfquery datasource="#application.ds#" name="update">
       update categories
       set hasproducts = <cfqueryparam value="1" cfsqltype="CF_SQL_INTEGER" />
       where category = <cfqueryparam value="#getparent3.parent_id#" cfsqltype="CF_SQL_INTEGER" />
     </cfquery>
     <cfquery datasource="#application.ds#" name="getparent4">
       select parent_id
       from categories
       where category = <cfqueryparam value="#getparent3.parent_id#" cfsqltype="CF_SQL_INTEGER" />
     </cfquery>
     <cfif getparent4.parent_id neq 0>
      <cfquery datasource="#application.ds#" name="update">
        update categories
        set hasproducts = <cfqueryparam value="1" cfsqltype="CF_SQL_INTEGER" />
        where category = <cfqueryparam value="#getparent4.parent_id#" cfsqltype="CF_SQL_INTEGER" />
      </cfquery>
      <cfquery datasource="#application.ds#" name="getparent5">
        select parent_id
        from categories
        where category = <cfqueryparam value="#getparent5.parent_id#" cfsqltype="CF_SQL_INTEGER" />
      </cfquery>
      <cfif getparent5.parent_id neq 0>
       <cfquery datasource="#application.ds#" name="update">
         update categories
         set hasproducts = <cfqueryparam value="1" cfsqltype="CF_SQL_INTEGER" />
         where category = <cfqueryparam value="#getparent5.parent_id#" cfsqltype="CF_SQL_INTEGER" />
       </cfquery>
      </cfif>
     </cfif>
    </cfif>
   </cfif>
  </cfif>
 </cfif>
</cfloop>

So today I worked on a project for CFWebTools.  The site I was working on has products and categories. The categories have subcategories that can go 4 levels deep.  When someone searches on the site for a term like “ipod”, right now all the results that include ipod are returned.   On the results page they want a select box with all the top-level categories listed that have ipod with in that category (and it’s sub-categories) along with a count. Similar to how Amazon does it.

So in the select box you would have something like this:

Apple (10)

Games (5)

Electronics (2)

Now category Apple might have a sub-category of Ipod, Ipod might have a sub-category of Headphones, Headphones might have a sub-category of… well you get the idea.   So the way things are set up right now, in order to get an accurate count of products containing the term i-pod within the Apple category and any of the sub-categories, I would have to do 5 nested queries.  Not only would I have to find the count, but determine what the top-level category is on a sub-category that might be 4 levels deep. Ugh…

So I suggested I create a script that we can run when needed (like once a day since Categories are imported daily).  The script will do the necessary nested query madness  and label all the subcategories with a main_id, which is ultimately the id of the top-level category.    They all have a parent_id but the parent is simply one level up, so if it is a 4th level sub category, the parent_id isn’t going to tell us what the top-level category is.  By adding in a field of a main_id in the database we can create a much quicker count and even optimize some other areas of the websites.

So, I added one field to the database called main_id.

Then I run this script you see below (takes about 4 seconds). You will see several if statements that check to see if the record count is greater than 0. I had to do this because I ran into sub-categories that were orphaned so there ends up not being a parent. This alleviates that problem and lets the script run without error.  Once this script runs, we also now know all of the sub-categories that are orphaned because their main_id will be null.  * I do use cfqueryparams on all queries just removed some from the sample since the indenting was already insane.


<cfquery datasource=#application.ds# name="getcats" >
 update categories set main_id = null
</cfquery>
<cfquery datasource=#application.ds# name="getcats" >
 select Category,parent_id from categories  order by parent_id,category
</cfquery>

<cfoutput query="getcats">
 <cfif parent_id eq 0>
  <cfset parent = category>
 <cfelse>
  <cfquery datasource=#application.ds# name="checkparent">
   select category,parent_id from categories where category = <cfqueryparam value="#getcats.parent_id#" cfsqltype="CF_SQL_INTEGER" />
  </cfquery>
  <cfif checkparent.recordcount gt 0>
   <cfif checkparent.parent_id eq 0>
    <cfset parent=checkparent.category>
   <cfelse>
    <cfquery datasource=#application.ds# name="checkparent2">
     select category,parent_id from categories where category = <cfqueryparam value="#checkparent.parent_id#" cfsqltype="CF_SQL_INTEGER" />
    </cfquery>
    <cfif checkparent2.recordcount gt 0>
     <cfif checkparent2.parent_id eq 0>
      <cfset parent=checkparent2.category>
     <cfelse>
      <cfquery datasource=#application.ds# name="checkparent3">
       select category,parent_id from categories where category = <cfqueryparam value="#checkparent2.parent_id#" cfsqltype="CF_SQL_INTEGER" />
      </cfquery>
      <cfif checkparent3.recordcount gt 0>
       <cfif checkparent3.parent_id eq 0>
        <cfset parent=checkparent3.category>
       <cfelse>
        <cfquery datasource=#application.ds# name="checkparent4">
         select category,parent_id from categories where category = <cfqueryparam value="#checkparent3.parent_id#" cfsqltype="CF_SQL_INTEGER" />
        </cfquery>
        <cfif checkparent4.recordcount gt 0>
         <cfif checkparent4.parent_id eq 0>
          <cfset parent=checkparent4.category>
         <cfelse>
          <cfquery datasource=#application.ds# name="checkparent5">
           select category,parent_id from categories where category = <cfqueryparam value="#checkparent4.parent_id#" cfsqltype="CF_SQL_INTEGER" />
          </cfquery>
          <cfif checkparent5.recordcount gt 0>
           <cfif checkparent5.parent_id eq 0>
            <cfset parent=checkparent5.category>
           </cfif>
          <cfelse>
           <cfset parent = 0> 
          </cfif>
         </cfif>
        <cfelse>
         <cfset parent = 0>
        </cfif>
       </cfif>
      <cfelse> 
       <cfset parent = 0>
      </cfif>
     </cfif>
    <cfelse>
     <cfset parent = 0>
    </cfif>
   </cfif>
  <cfelse>
   <cfset parent = 0> 
  </cfif>
 </cfif>
 <cfif parent neq 0>
  <cfquery datasource="#application.ds#">
   update categories
   set main_id = <cfqueryparam value="#parent#" cfsqltype="CF_SQL_INTEGER" />
   where category =  <cfqueryparam value="#getcats.category#" cfsqltype="CF_SQL_INTEGER" />
  </cfquery>
 </cfif>
</cfoutput>

So now I can create one query that will pull a count of a given search term from all sub-categories within a top-level category and come back with one count per top-level category. No nesting of queries. Runs much faster on the live site then the nesting would.


  SELECT
     count(cat.main_id) as count, cat.main_id, cat2.name
  FROM
     products p
  INNER JOIN
     table_b b on p.id = b.prod_id
  INNER JOIN
     table_c c on c.categoryid = b.custcat_id
  INNER JOIN
     categories cat on c.categoryid = cat.categoryid
  INNER JOIN
     categories cat2 on cat2.main_id = cat2.categoryid
  WHERE
     p.prodname LIKE <cfqueryparam cfsqltype="cf_sql_varchar" value="%#trim(myTerm[ter])#%" />
 GROUP BY
     cat.main_id, cat2.name

New Job

I recently got hired by CFWebTools, I’ve really enjoyed my first week here. I’ve been doing internet development with a focus on ColdFusion and SQL since 1996. My boss is Mark Kruger, a.k.a ColdFusion Muse. If you aren’t reading his blog, you should be!  From time to time I will be posting about ColdFusion, SQL, jQuery, CSS and other web development related topics.

Follow

Get every new post delivered to your Inbox.