1.9 Orphaned htmlpage SQL

Description

I know 1.9 is quickly becoming obsolete, and that 2.0 really solves the whole orphaned file issue.  However, from time-to-time we still have clients come to us with orphaned files.  Here are the queries I use to find/fix them.   I wronte these in SQL Server, but they can be applied to each db engine with little modification.  Note that these will not work in 2.0 because the data mode for this stuff is compeletey changed.

Snippet #1 - If you know the page(s)'s uri(s) you first need to find the page identifiers.  This qurey should help here.  Here the inode column is the page's identifier.

Snippet #2 - Join to the inode and htmlpage tables to make sure there are inodes and working versions. The idea here is that you should get the same number of results as pages you are looking for.  If you are good here you can skip to Snippet #5

Snippet #3 - If you didn't get a working version for each page, check for any versions by taking off the and working=1 part in Snippet #2 and make sure each page has at least 1 version that is not deleted.  If a page does not have a working version in the htmlpage or inode table, then you should delete the identifier from db so you can recreate the page, it is not recoverable.  From there we can determine the latest version of each htmlpage with this query.  The Where that is is commented out in the example would be the identifiers of the pages you found in Snippet #1 (you could use a subquery here), but if oyu leave them off this query will return any html page that has the latest version not the working version of the htmlpage.  It is important to note that not all of this is bad, you can have an "older" version be the working version.  Check and make sure the page doesn't already have a working version first before you fix it.

Snippet #4 - This is how to make the latest version the working version.  You would put the inodes you got in snippet #3 inside the in clause here.  You should run Snippet #2 again to make sure each page has only 1 working version.

Snippet #5 - Now that all of your pages have 1 and only 1 working version we need to make sure that they have templates.  Again here you want 1 record returned for each page.  If you don't have a record returned for each page, then you need to insert a record into the tree table where the child is the page's working inode and the parent is the corrent template identifier.  Relation type here is parentPageTemplate. Tree order is always 0.

Snippet #6 - Now that your templates are in place, let's make sure that the tree has proper pointers to the identifiers for each page.  This query builds on the last.  Same thing here, one record per page.  Again, if you don't get a record for a page, then you need to insert a record into the tree where child is the page's working inode and the parent is the page's identifier. relation type here is child.

Snippet #7 - Almost done, but this is where 99% of the orphaned pages end up.  This query checks for the folder entry in the tree.  If one of these is missing you need to go to the folder table and the folder's inode and get a record in the tree table where the child is the page's working inode and the parent is the folder's inode.  Relation type here is child.

I suppose you could jump straight to Snippet #7, but I find that it is harder to tell what is missing this way.  Another hint here for 5-7 is to make the regular joins to the inode table (i2, i3), left joins instead. Then you can check for i2.inode is null and find the records that missing.  

Hope it helps someone, I know I have to rewrite them everytime I work on one of these problems and I've done it enough times now that I felt it was worth writing down and sharing. 

Aren't I glad 2.0 makes these errors pretty near impossible.

Code

-- Snippet ${esc.hash}1 - Find Identifiers:
select inode,uri from identifier
where uri in (
  '/path/to/page1.dot',
  '/path/to/page2.dot');

-- Snippet ${esc.hash}1 - Check for Working Versions:
select d.inode as id,uri,i.inode,live,working
from identifier d, inode i, htmlpage h
where uri in (
  '/path/to/page1.dot',
  '/path/to/page2.dot')
and working=1 and deleted=0;

-- Snippet ${esc.hash}3 - Check for Latest Versions;
select d.uri, d.inode as id,
  i.inode, i.type, i.idate,
  h.title, working, live, deleted
from (
select identifier, MAX(idate) as maxdate
from inode
--where identifier in ('123711','123695')
group by identifier
) mi
  inner join inode i on mi.identifier = i.identifier and mi.maxdate=i.idate
  inner join identifier d on mi.identifier = d.inode
  inner join htmlpage h on i.inode = h.inode
where working=0 and deleted=0;

-- Snippet ${esc.hash}4 - Fix htmlpages with no working versions
update htmlpage set live=1,working=1 where inode in ('12345','23456');

-- Snippet ${esc.hash}5 - Check for templates:
select d.inode as id, d.uri, i.inode, h.live, h.working,
  t1.parent as tree_template,
  t2.parent as tree_identifier
from identifier d, inode i, htmlpage h
  left join tree t1 on h.inode=t1.child and t1.relation_type='parentPageTemplate' 
where uri in (
  '/path/to/page1.dot',
  '/path/to/page2.dot'
) and d.inode = i.identifier and i.inode = h.inode
and h.working=1
order by uri;

-- Snippet ${esc.hash}6 - Check for identifiers in the tree:
select d.inode as id, d.uri, i.inode, h.live, h.working,
  t1.parent as tree_template
from identifier d, inode i, htmlpage h
  left join tree t1 on h.inode=t1.child and t1.relation_type='parentPageTemplate'
  left join tree t2 on h.inode=t2.child and t2.relation_type='child'
  join inode i2 on t2.parent=i2.inode and i2.[type] = 'identifier'
where uri in (
  '/path/to/page1.dot',
  '/path/to/page2.dot'
) and d.inode = i.identifier and i.inode = h.inode
and h.working=1
order by uri;

-- Snippet ${esc.hash}7 - Check for folders in the tree:
select d.inode as id, d.uri, i.inode, h.live, h.working,
  t1.parent as tree_template,
  t2.parent as tree_identifier,
  t3.parent as tree_folder, f.[path]
from identifier d, inode i, htmlpage h
  left join tree t1 on h.inode=t1.child and t1.relation_type='parentPageTemplate' 
  left join tree t2 on h.inode=t2.child and t2.relation_type='child'
  join inode i2 on t2.parent=i2.inode and i2.[type] = 'identifier'
  left join tree t3 on h.inode=t3.child and t3.relation_type='child'
  join inode i3 on t3.parent=i3.inode and i3.[type] = 'folder'
where uri in (
  '/path/to/page1.dot',
  '/path/to/page2.dot'
) and d.inode = i.identifier and i.inode = h.inode
and h.working=1
order by uri;