Ghost Assets Cleanup on Solr

Overview

Ghost Assets have the following properties:

  1. They exist in the Solr indexes.
  2. They do not exist in the Telescope database (editorial and doc_renditions table).

Assets with these traits will turn up in search results in TeleScope but clicking on the assets brings up an error dialog with the message: An error occurred and was handled. Details about the application event that caused the error to occur have been saved in the log file.
ghostasset.png

 

Solution

Pre-requisites

<supportagent>

- Telescope On-Demand Customer Environment Details: You may need to access the database & Solr instances

</supportagent>

- SQL Server Management Studio (SSMS)

Root Cause

Ghost Assets can be created in different ways - some examples are the following cases:

  • Deleted records do not get deleted but show up in SOLR searches.
  • Solr indexes the placeholder record when a file fails to ingest.

The number of Ghost Assets can be calculated by checking how many more records does Solr indexes have over the database of assets. You can check how many Ghost Assets are present by checking how many assets are in Solr by performing a Show All Assets search from TeleScope as an Admin, and checking how many assets are in the database by noting the result of running SELECT COUNT(record_id) FROM editorial in the database.
The difference between these two values tells you how many Ghost Assets are currently in the system.

Steps To Fix

This issue was fixed in TeleScope Standard 2020.4. As part of this fix, when you visit a list of assets and if there is any ghost asset in that view, TeleScope will make note of it and delete it such that it won't be visible the next time you visit the same page. If your TeleScope installation has customizations or you're an OnDemand customer, please contact your Account Manager or support for further assistance with upgrading.

However, note that upgrading won't delete any Ghost Assets that were formed prior to upgrading. To have Ghost Assets deleted from earlier or just delete ghost assets until you upgrade, follow the steps in the Alternative Solution section below. OnDemand customers should instead raise a support ticket for assistance with this.

Alternative Solution

To delete ghost assets you need the record_ids of the assets you want to delete from Solr. Two methods are outlined below to find the ids of ghost assets, and you may need to pick one method depending on if you are required to delete all ghost assets or specific ones. The process to delete will be the same for both of the discovery methods.

Find all Ghost Assets:

  1. In the Solr Query Builder, set fl = id and do a query to find out how many total records are in Solr - the number next to numFound. You may need to do this for every core in Solr.
  2. Take the number next to numFound, place it into the following query, replacing NUMFOUND with the actual value. If needed, replace recs with the core being used such recs2, recs3, etc.
    http://localhost:8983/solr/recs/select?q=*%3A*&rows=NUMFOUND&fl=id&wt=csv
    This should give you a list of all the current record_id's in Solr.
  3. You can then compare(diff) this to the list of record_id's in the editorial with the tool of your choice (Excel, bash, etc...)
    • If there is something in Solr that is not in editorial, it is a ghost asset.
    • If there is something in the editorial database table that is not in Solr, then I would suggest possibly a full reindex or a manual insert if there are not too many records into search_index_actions. However, this should not occur in our experience. Check search_index_actions to ensure that it is empty (i.e.: no indexing actions are pending).

Find Specific Ghost Assets:

  1. Log in to the production environment and view all the assets by clicking on Assets in TeleScope Web.
  2. Ghost assets will not have any information displayed next to their thumbnail, and all metadata will be missing.
  3. Inspect the HTML for a particular ghost asset listing by right-clicking on the ghost asset and selecting Inspect(in Chrome) or Inspect Element(in Firefox). This action should bring up a window in the browser as shown here.
    assetid.png
  4. Copy the numeric part from the id key of every tr element. That is the record ID of the ghost asset.
  5. Make a list of the record_ids of the ghost assets.
  6. Check that these record_ids do not exist in the database. If they exist in the database, it means that the records are not ghost assets! Remove such record_ids from the list of ghost assets.
    Example of a query to use to check:
    SELECT * FROM editorial WHERE record_id IN (371684, 371665, 371663);

Delete Ghost Assets by record_id:

  1. RDP into the client's Solr server to delete the record on Solr.
    To ensure that you are on the Solr server, verify if you can access the Solr Console, by typing localhost:8983/solr on the machine in a browser. Note that 8983 is the default Solr port, and the port in use can be found by running ./bin/solr.cmd status from the Solr install directory in the terminal which generates an output as below:
    Found 1 Solr nodes: 
    
    Solr process 9713 running on port 8983    
  2. To find out the core on which the asset is available, perform a search for every asset against every core in Solr.
    1. Select the core from the dropdown
      selectcore.png
    2. Search for the record in that core
      abc.png
    3. Note down the core for the record_id for the next step.
  3. On the Solr server, type the following in the browser, replacing CORENAME with the name of the core discovered in the step above:
    http://localhost:8983/solr/CORENAME/update?stream.body=%3Cdelete%3E%3Cquery%3Eid:RECORDIDHERE%3C/query%3E%3C/delete%3E&commit=true
    Replace RECORDIDHERE with the actual record_id as shown below.deleterecord.png

 

Testing

Check that the file(s) has been deleted from Solr. A search in the Solr indexes should return 0 results. This process has been shown in the images below.

gacus-006.png

gacus-007.png

Comments

0 comments

Article is closed for comments.