deaddabe

Using and extending the {attach} directive in Pelican

When you use a static website generator, managing images is hard. Especially when you are regularly writing content and adding new pictures — like on this blog. Hopefully, the Pelican static generator has a special directive that we can use (and extend?) to regroup images in each articles' folders.

One directory per article

I have made the choice to output articles in their own directories. Coupled with an index.html file, this allows you to consult them with the following link structure:

https://deaddabe.fr/blog/2020/08/30/foo-bar/

When you are accessing this URL, the web server will look for an index.html file in this folder and serve it to you. The alternative would be to directly store the articles in their own HTML files, instead of a folder:

https://deaddabe.fr/blog/2020/08/30/foo-bar.html

These two examples are common URL schemes for blogs and news sites. I prefer to use directories like in the first example, because:

  • The URL looks nicer, by replacing the .html extension with a simpler / trailing slash character.
  • The created directory can be used for storing every media related to the article, as we will see in this article.

In order to activate this feature in Pelican, you have to use the following directives in you pelicanconf.py file:

# Article URL
ARTICLE_SAVE_AS = "blog/{date:%Y}/{date:%m}/{date:%d}/{slug}/index.html"
ARTICLE_URL = "blog/{date:%Y}/{date:%m}/{date:%d}/{slug}/"

# Page URL
PAGE_ORDER_BY = 'attribute'
PAGE_SAVE_AS = "{slug}/index.html"
PAGE_URL = "{slug}/"

# Index URL
INDEX_SAVE_AS = "blog/index.html"
INDEX_URL = "blog/"

Not only article but also pages and index URLs are configured to use this naming scheme.

Storing media in the same output folder

Our articles are now stored in directories named after them. We can try to move related media in these folders. For this, we need to read the Pelican documentation. On the Writing content page, we can read the following:

Note: Placing static and content source files together in the same source directory does not guarantee that they will end up in the same place in the generated site. The easiest way to do this is by using the {attach} link syntax (described below).

This is exactly what we want to achieve.

Let’s have a look at this {attach} link syntax in closer details:

Starting with Pelican 3.5, static files can be “attached” to a page or article using this syntax for the link target: {attach}path/to/file.

This works like the {static} syntax, but also relocates the static file into the linking document’s output directory. If the static file originates from a subdirectory beneath the linking document’s source, that relationship will be preserved on output. Otherwise, it will become a sibling of the linking document.

The second paragraph is very important to understand. The {attach} directive has two different behaviors, depending on where the attached files are located related to the article:

  • If the attachment is in a subfolder from the article’s source location, then the subfolder structure will be kept for the output location.
  • Otherwise, the attachment will be put in the same directory as the article’s output location.

Let’s consider an article located at articles/foo-bar.rst. It will be generated to the articles/foo-bar/index.html output file.

Now, if we attach an image located at:

  • articles/images/example.jpg, then the output image will be located at articles/foo-bar/images/example.jpg;
  • images/example.jpg, then the output image will be located at articles/foo-bar/example.jpg.

Grouping media per article

In these two examples, the resulting output structure is fine. However, there is a problem in the input structure: all images for all articles will be in the same folder. This will result in a headache trying to figure out which image is used in which article. It will also cause naming conflicts.

A solution for this problem is to regroup the images in directories that carry the same name as the article. Like images/foo-bar/*.jpg for the articles/foo-bar.rst document. This way, it is easy to recognize which image is used in which article.

Now, let’s reconsider the example, with the same location of the article file. But now with the per-article subdirectory. If we attach an image located at:

  • articles/images/foo-bar/example.jpg, then the output image will be located at articles/foo-bar/images/foo-bar/example.jpg;
  • images/foo-bar/example.jpg, then the output image will be located at articles/foo-bar/example.jpg.

You can now see why the second approach is the correct one. If we put the images directory in the same folder as the articles directory, then the path of the image will contain the article’s slug twice. This repetition is unwelcome.

By putting the image directory one level down, we use the feature of the {attach} link that will remove any subdirectory structure and put the attachment next to the article.

Homemade Pelican rST preprocessor plugin

We are now in a state where our images can easily be put into directories, and then include them in articles without risking any conflict. The images will be put into the article’s directory, keeping things ordered on the output side.

6 clothes pins disposed in a star shape

One last problem to our setup is that including the files is not ergonomic. Indeed, in order to attach them in the foo-bar.rst article, you need to use the following restructuredText directives:

.. image:: {attach}../images/foo-bar/example.jpg
    :alt: An example image.
    :target: {attach}../images/foo-bar/example.jpg

Typing the {attach}../images/foo-bar/ path each time is repetitive. Ideally, we would just have to type example.jpg and let Pelican infer the directory name for us. It means modifying the framework in ways that would be very complex.

What we can however do is to write a preprocessor plugin for Pelican, that will read the .rst file and modify it on the fly before passing it to the real restructuredText parser.

The input before the preprocessing would be:

.. image:: {dir_attach}example.jpg
    :alt: An example image.
    :target: {dir_attach}example.jpg

And then after the preprocessing, the passed restructuredText source would be:

.. image:: {attach}../images/foo-bar/example.jpg
    :alt: An example image.
    :target: {attach}../images/foo-bar/example.jpg

Writing a restructuredText preprocessor is not really documented in the Pelican documentation. I worked my way into the Pelican source code — as well as other Pelican plugins — before managing to write the code that effectively preprocess the source files.

This code is covered by (passing) doctests. They helped me design and verify that the output was correctly generated. The use of regexes is necessary, as just replacing the {dir_attach} string would make this article impossible to write: all occurrences would have been replaced away while trying to explain how it works.

This plugin will conclude this article, so that you do not need to scroll all the way down if you want to skip the 110 lines of code. I hope that it will encourage Pelican blog enthusiasts to use this nice media organisation scheme. I know I will for sure, as I do like to take shots.

"""dir_attach, a Pelican plugin to attach files from a directory named after
restructuredText file names.

If your file is named ``foobar.rst``, then the following
restructuredText entry:

    .. image:: {dir_attach}example-thumbnail.jpg
        :alt: An example image for demonstration.
        :target: {dir_attach}example.jpg

Will automatically be pre-processed by this plugin to become:

    .. image:: {attach}../images/foobar/example-thumbnail.jpg
        :alt: An example image for demonstration.
        :target: {attach}../images/foobar/example.jpg

Using ``../images/`` directory instead of ``images/`` is mandatory. This is
because the current Pelican implementation will simplify the path only if the
target directory is not in the same root as the current page.

If ``images/`` directory was used, then the resulting path in HTML would be:

    foobar/images/foobar/example.jpg

By using a directory outside of the page's source folder, the generated HTML
path is simplified by Pelican:

    foobar/example.jpg

"""

import os
import tempfile
import re

from pelican import signals
from pelican.readers import RstReader


def dirname_from_source_path(source_path):
    """Get the dirname to use from the .rst filename.

    >>> dirname_from_source_path("/tmp/foobar.rst")
    'foobar'

    """
    return os.path.splitext(os.path.basename(source_path))[0]


def expand_dir_attach(content, dirname):
    """Expand the {dir_attach} directives to image {attach} directives.

    >>> src = \
        ".. image:: {dir_attach}example.jpg\\n" \
        "    :alt: Example illustration.\\n" \
        "    :target: {dir_attach}example.jpg\\n"
    >>> expand_dir_attach(src, "foobar")
    '.. image:: {attach}../images/foobar/example.jpg\\n\
    :alt: Example illustration.\\n\
    :target: {attach}../images/foobar/example.jpg\\n'

    """

    dest = f"{{attach}}../images/{dirname}/"

    # We are trying to match only restructuredText directives instead of doing
    # a whole content.replace to avoid to replace inside of paragraphs, etc.
    regexes = (
        (r"^(.. \w+::) {dir_attach}(.*)$", fr"\1 {dest}\2"),
        (r"^(    :\w+:) {dir_attach}(.*)$", fr"\1 {dest}\2"),
    )

    for match, replace in regexes:
        content = re.sub(match, replace, content, flags=re.MULTILINE)

    return content


class CustomRstReader(RstReader):
    """A custom restructuredText reader that will pre-process source first."""

    enabled = True
    file_extensions = ['rst']

    def read(self, source_path):
        """Parses restructured text."""
        dirname = dirname_from_source_path(source_path)

        # Open temporary file in "w" instead of default "w+b" in order to
        # use utf-8 by default.
        with tempfile.NamedTemporaryFile(mode="w") as tmp:
            with open(source_path) as src:
                tmp.write(expand_dir_attach(src.read(), dirname))

            # Force flush to disk before docutils tries to open the file
            # in super().read(), elsewise file may be empty.
            tmp.flush()

            return super().read(tmp.name)


def add_reader(readers):
    """Override the .rst reader with our custom reader."""
    readers.reader_classes['rst'] = CustomRstReader


def register():
    """Register the plugin to Pelican."""
    signals.readers_init.connect(add_reader)