Scribus
Open source desktop publishing at your fingertips
Scribus Image Cache

This page gives some details about the Scribus image cache manager implemented in ScImageCacheManager. The image cache manager, accompanied by a number of helper classes, is responsible for caching low-resolution versions of images used in Scribus documents.

As the loading of images and their conversion to low resolution consumes a lot of time, the image cache helps to massively speed up the loading of images that have been previously loaded under the same conditions. It will also speed up operations like undoing or redoing image effects.

The image cache was designed to be accessible simultaneously by multiple instances of Scribus. It should even be possible to share the cache over a network drive, although this will surely degrade performance.

File Types in the Image Cache

All files stored in the cache are either short XML documents or real image files. PNG has been chosen as the image format, as it offers good compression and is a lossless format. At low compression levels, it is also quite fast.

There are quite a lot of properties in Scribus that have an influence on how an image will be rendered on the screen. These are mainly color management and image effects. These properties will be called modifiers in this text.

Image information is properties directly associated with the on-disk image file, e.g. resolution or EXIF data. As the original image is not read when fetching images from the cache, this data needs to be cached as well.

Meta information, finally, is information describing the cache entry. It contains properties like the the on-disk image file path, the image file size or the last modification date. It is used to identify whether or not an image can be fetched from the cache or must be reloaded from its original file.

-------------------------------------------------------------------------------

   Meta File (.xml)            Reference File (.ref)       Image File (.png)

  .-----------------.         .-----------------.         .-----------------.
  |meta information |-------->|reference count  |         |cached image     |
  |modifiers        |         |                 |-------->|                 |
  |image information|    .--->|                 |         |                 |
  '-----------------'    |    '-----------------'         '-----------------'
                         |
  .-----------------.    |
  |meta information |    |
  |modifiers        |----'
  |image information|
  '-----------------'

-------------------------------------------------------------------------------

As different combinations of modifiers may end up producing exactly the same image in the cache, multiple meta files may reference the same image file. To keep track of the number of references, each cached image is accompanied by a reference file that differs from the image file only by the file suffix.

To avoid races between multiple instances of Scribus, possibly even running on different machines, all files must be accessed atomically when the cache is being modified. Non-modifying, read-only accesses are always allowed.

This is usually achieved by the following mechanisms:

  • Lock files (with an additional suffix ".lock") are created by the instance that wishes to modify an entry in the cache. Only the instance that has successfully acquired all necessary lock files may modify the cache. As it is close to impossible to atomically create a file in a platform-independent way, the lock file is actually implemented as a lock directory. See ScLockedFile for details.
  • All files that are created or modified are created as temporary files first. Only when they have been written completely, the old version of the file is unlinked and the new version is renamed to its final name.

This ensures that an instance that only wishes to read from the cache can safely do so even without caring about locked files.

Furthermore, in order to avoid any deadlocks or delays, locking only makes sure that only one instance writes to the cache at a time. If another instance fails to get the necessary locks, it will simply not not carry out the whole cache access.

Directory Structure

Each cache file is uniquely indentified by a hexadecimal MD5 hash. The first two hex digits represent two levels of subdirectories and the remainder forms the start of the file's basename, for example:

  $(HOME)/.scribus/cache/img/a/e/15c5160668926e4a7c593a813a0d68.xml

Within each folder of the cache structure, there is an additional access file that keeps track of write accesses to this folder. The purpose of this file is to notify other instances of Scribus when entries in the cache have been modified. The file simply contains a counter that is incremented with each write access to the cache. The file also serves as a lock for the directory. Instead of locking individual files in the cache, locking the access files is sufficient.

Cache Housekeeping

Each instance doing any write access to the cache will first create its own lock file in:

  $SCRIBUS/cache/img/locks/

The name does not matter. After successful creation of this file, the instance checks for the presence of the master lock file

  $SCRIBUS/cache/img/locks/master.lock

If this is present, the instance will remove its own lock file and will not initiate any write accesses to the cache.

An instance wishing to do a cleanup will attempt to create the master lock file. If it succeeds, it will check that no other lock files are present in the lock directory. If other lock files are present, it will remove the master lock file and not perform a cleanup. If no other lock files are present, the instance has exclusive write access to all cache files.

After each write operation, a cache cleanup is performed if necessary. This means, if the cache limits (number of meta files or total cache size) are exceeded, the oldest meta files will be removed until the cache is within the user defined limits again.

In the Scribus startup phase, if a master lock can be acquired, the instance will also sanitize the cache. This includes operations like removing any orphaned files or fixing reference counters.

Keeping the Cache Image up-to-date

The cache image is the cache manager's internal representation of all files in the on-disk cache. It is a tree of ScImageCacheDir and ScImageCacheFile objects. The ScImageCacheDir objects emit signals when files in the cache are updated. These signals drive additional operations in the cache manager like updating the total cache size or the meta age list that keeps track of the oldest meta files in the cache.

Each time a Scribus instance performs a write to the cache, it attempts to acquire a master lock in order to remove old files if necessary. Other instances might also have modified the cache in between, so it is mandatory to update the cache image before.

However, instead of rescanning the whole cache structure, the cache manager only looks for changes to the access files. If a change has been detected in one directory, its subdirectories are checked recursively. Only directories that have been modified by other running instances of Scribus need to be rescanned. So, in the most common case of only one Scribus instance running at a time, no rescans have to be performed.

There is one case, however, where the cache image is not kept up-to-date. Whenever a read-only access to the cache is performed, the corresponding meta file is touched to prevent it from being deleted when the cache is cleaned up. This operation does not directly trigger an update of the cache image. Updating the modification timestamp is delayed until a cache cleanup becomes necessary. Before the oldest metafile is actually deleted from the cache, its timestamp is checked and it will only be removed if it is still the oldest file in the cache. Otherwise, it's position in the MetaAgeList will be updated. The main driver behind this is that cache reads should be cheap and not require any locking. However, any changes to the cache need to be reflected in the access files, which would in turn require locking.

Accessing the Image Cache

To access the image cache, a ScImageCacheProxy object is needed. It provides all necessary functionality to read and write images in the cache. See pageitem.cpp and scimage.cpp for examples.

Internally, write accesses to the cache are bracketed with the help of an ScImageCacheWriteAction object. This object is being notified of all files that participate in the cache access and will carry out all necessary locking, updating of the access files and notifying the cache manager of any changes.

Performance Measurements

The following table shows wallclock and real CPU times for loading different documents in Scribus. In most cases, documents have been loaded multiple times. Before the first load, the filesystem cache was completely flushed.

As can be seen, there is no difference in load times if the cache is disabled in the Scribus preferences. This is important for users who wish to disable the cache (for whatever reasons).

Also, first load times are not severely longer if the cache is enabled. In the worst case, the first load time was less than 20% longer. Most of that time is spent compressing and writing the cache images, which can be seen in the last rows of the table. If the images are already found in the cache and only the meta files have to be created, the load time is almost equivalent to the load time without cache support.

However, load times are significantly shorter for the second and third load of the document. The reason for the third load time being even shorter is that the cache files are likely to still be present in the filesystem cache. Usually, re-loading a document is 20 to 50 times faster with the cache enabled. All measurements were done with medium resolution (72dpi) cache image files and with the default cache image compression level of 1. Raising the compression level beyond 4 will mainly slow down the first load of images. Setting it to zero will significantly increase the cache file size.

-------------------------------------------------------------------------------
                    trunk original      trunk with cache    trunk with cache
                    no cache support    cache disabled      cache enabled
-------------------------------------------------------------------------------
                    wall      real      wall      real      wall      real
-------------------------------------------------------------------------------

1 page document
5 small images
on local disc

  1. load             9.802     2.060     9.661     2.070     9.517     2.610
  2. load             1.802     1.640     1.812     1.750     0.585     0.530
  3. load             1.744     1.630     1.709     1.630     0.547     0.470

266 page document
2.6 GiB of TIFFs
on local disk
(CMS disabled)

  1. load           235.080   194.850   233.662   192.800   277.886   226.750
  2. load           226.178   191.510   225.340   191.920    11.276     9.270
  3. load           227.191   191.580                         7.632     7.320

160 page document
2.6 GiB of TIFFs
on network drive
(CMS enabled)

  1. load           979.028   602.100                      1011.878   631.290
  1. load [1]                                               985.161   608.530
  2. load           972.407   599.280                        34.296    25.860
  3. load                                                    18.605    18.310

-------------------------------------------------------------------------------
 [1] image files already found in cache
-------------------------------------------------------------------------------