nw-page-editor

nw-page-editor is an application for viewing/editing ground truth or predicted information for diverse purposes related to the areas of document processing and text recognition. The edition is done interactively and visually on top of images of scanned documents. Additionally the app supports many keyboard shortcuts to allow more efficient editing, see section Application usage shortcuts.

The app is available in two variants. The first variant is as a desktop application based on the NW.js framework thus making it cross-platform. The second variant is as a web application that allows remote editing by multiple users and can be easily setup via a docker container.

Supported formats

The format used for storing the information is the omni:us Pages Format hightly based on the PRImA Page XML schema, but only supports a subset of elements and has a few minor extensions.

New XML files created by the app are saved as the latest version of the omni:us schema. Modifications to existing XML files using older omni:us schemas or PRImA schemas since version 2013-07-15 are saved using the original schema. Note that this might mean that the saved XMLs are no longer valid according to that schema. Additionally the app supports importing XMLs from other formats, namely: PRImA 2010-03-19, alto v2 and v3, tet and poppler's pdftotext xhtml. Many details from imported formats might be lost and there is no functionality to export back to those formats.

Desktop variant

After correct installation of the desktop app, the standard way in Linux and Mac to start the app is through the command line, normally specifying which file(s) to open. In windows currently it is not possible to open files directly from the command line, so you need to start the app and then select a file to open.

Command line synopsis

nw-page-editor [*page.xml*]+ [*pages_dir*]+ [--list *pages_list*]+ [--css *file.css*]+ [--js *file.js*]+

Example Page XML files

You can find example Page XML files in the nw-page-editor source code repository, the directory examples. Thus, you can open the examples as:

nw-page-editor examples/*.xml

Desktop variant installation

Linux/Mac:

  1. Download the SDK version of the NW.js package appropriate for your platform from http://nwjs.io/downloads. Extract it to a location where you store applications and add to your PATH the directory containing the nw|nwjs binary: root of package in Linux or nwjs.app/Contents/MacOS in OSX. Alternatively to setting the PATH it is also possible to simply symlink the nw|nwjs executable to a directory already in your path.
  2. Clone the nw-page-editor repository (using the --recursive option) to a location where you store applications and add the package's bin directory to your PATH. Like in the previous step, alternatively you can simply symlink the bin/nw-page-editor executable to a directory already in your path.

Windows:

In the github repository's releases there are for selected version prebuilt packages for windows. To install, go to https://github.com/mauvilsa/nw-page-editor/releases and download and extract the latest nw-page-editor-win-x64.zip file. To open the app, execute the nw-page-editor.exe and for convenience create a shortcut to open the app.

If you are interested in the latest version for which there is no prebuilt package, follow the following instructions:

  1. Download the SDK version of the NW.js package appropriate for your platform from http://nwjs.io/downloads. Extract it to a location where you store applications renaming the base directory to nw-page-editor.
  2. Move, copy or clone the files of this github repository such that the file package.json is in the same directory as nw.exe. If cloning be sure that you use the --recursive option.
  3. For convenience rename the nw.exe executable to nw-page-editor.exe and create a shortcut to it in your desktop to ease opening the app.

Notes:

  • The reason to install the SDK version of NW.js is to allow inspection of elements using the Chrome DevTools and for example be able to do Page XML modifications not implemented in the app.

Desktop variant startup CSS and JavaScript files

The --css and --js command line options can be used to modify the appearance or execute custom code on startup. For example, the following JavaScript code changes the position, color and size of overlayed text, and prints to a pdf.

setTimeout( function() {    $.stylesheet('#page_styles { #xpg .TextEquiv }').css( 'fill', 'green' ); /* Set font color to green */    $.stylesheet('#page_styles { #xpg .TextEquiv }').css( 'font-size', '60px' ); /* Set font size to 60px */    pageCanvas.cfg.textPositionOffset = [ 10, 0 ]; /* Set text position x and y offsets */    pageCanvas.util.positionText(); /* Update text position */    nw.Window.get().print({pdf_path:loadedFile.replace(/\.xml$/,'.pdf')}); /* Print to pdf */}, 500 );

To run this example save this code snippet to a file test.js and start the editor from the command line as

nw-page-editor --js test.js examples/lorem.xml

Web server variant

The page editor can also be used as a web server allowing multiple users to edit page xmls remotely. Moreover, the server can be configured so that all the history of changes of the page xmls are saved in a git repository with commits associated to the respective users. To ease the installation and usage of the web server version, docker is used.

Web server variant installation

The steps for installation are the following:

  1. Install docker in the server and for convenience configure it so that sudo is not required to run containers.
  2. Either pull the latest image of nw-page-editor-web from docker hub by choosing one of the available tags or build the docker image from the source.

## Pull from docker hub ##TAG="YOUR CHOSEN TAG HERE"docker pull mauvilsa/nw-page-editor-web:$TAG ## Build docker image from source ##TAG="local"docker build -t mauvilsa/nw-page-editor-web:$TAG .

  1. Create a directory for the data and copy the images and page xml that will be available to access remotely. It is highly recommended to make the data directory a git repository so that change history is kept.

## Create directory for data ##mkdir data ## Copy documents, e.g. the examples in nw-page-editor source ##cp $NW_PAGE_EDITOR_SOURCE/examples/* data ## Create list of xmls if use of lists is needed ##ls data/*.xml | sed 's|^data/||' | head -n 2 > data/xmls.lst ## Init data directory as git repo ##git init data

  1. By default the web app can be accessed without authentication in which case all git commits are associated to the anonymous user. Alternatively you can create users with passwords to restrict the access to the app and associate commits to different people. This is done by creating the file data/.htpasswd using the htpasswd tool. This tool might not be available in the server, but it is included in the nw-page-editor-web docker image. To ease the usage of htpasswd within this image, it is recommended that you use the docker-cli script. Just download it into some directory included in your PATH and then do the following.

## Create users and passwords ##docker-cli -- mauvilsa/nw-page-editor-web:$TAG htpasswd -cb data/.htpasswd [email protected] pass1 docker-cli -- mauvilsa/nw-page-editor-web:$TAG htpasswd -b data/.htpasswd [email protected] pass2 ## For more details on htpasswd usage ##docker-cli -- mauvilsa/nw-page-editor-web:$TAG htpasswd --help

  1. Start a container exposing the web server port 80 to a port of your preference (e.g. 8080) and set the data directory as a volume.

docker run --rm -d -p 8080:80 \ -v $(pwd)/data:/var/www/nw-page-editor/data \ mauvilsa/nw-page-editor-web:$TAG

  1. The Page XMLs can be accessed using URLs like the example ones below. Three different GET parameters can be used to specify which documents to view. In all cases relative paths with respect to the data directory must be given. The three supported parameters are:
  • f: to specify a single xml file
  • l: to specify a list of xml files
  • d: to specify a directory
http://$SERVER_ADDRESS:8080/app?f=lorem.xml
http://$SERVER_ADDRESS:8080/app?l=xmls.lst
http://$SERVER_ADDRESS:8080/app?d=.

Data owner, group and permissions

By default the files created/modified by the web server will use a default umask and have the same owner and group as the data directory. These defaults can be overridden by defining the DATA_UMASK, DATA_UID and DATA_GID environment variables.

docker run --rm -d -p 8080:80 \ -e DATA_UMASK=007 -e DATA_UID=1234 -e DATA_GID=5000 \ -v $(pwd)/data:/var/www/nw-page-editor/data \ mauvilsa/nw-page-editor-web:$TAG

Web server variant startup CSS and JavaScript files

Similar to the desktop variant, the web variant can also receive css and js files to modify the appearance or execute custom code on startup. To use this functionality the css and js files need to be included in some path inside the data directory. Then the docker container should be started setting as value to the CSS and JS environment variables, relative paths to the files separated by spaces. For example:

docker run --rm -d -p 8080:80 \ -e CSS=mystyle.css -e JS=mycode.js \ -v $(pwd)/data:/var/www/nw-page-editor/data \ mauvilsa/nw-page-editor-web:$TAG

Application usage shortcuts

ShortcutCommand
ctrl/cmd + qQuit application
ctrl/cmd + oOpen file dialog
ctrl/cmd + sSave changes to file
ctrl/cmd + shift + sSave as new file dialog
ctrl/cmd + pPrint
ctrl/cmd + enterOpen/close menu
ctrl/cmd + nNew window
pageupLoad previous document
pagedownLoad next document
shift + pageupLoad previous 10th document
shift + pagedownLoad next 10th document
ctrl/cmd + zUndo last change
ctrl/cmd + yRedo last change
mouse clickSelect or deselect element
escDeselect the currently selected element
tabSelect the next element
shift + tabSelect the previous element
ctrl + tabSelect the next dragpoint
ctrl + shift + tabSelect the previous dragpoint
ctrl + fEnable/disable text filtering
mouse clickStart creating or add point to polygon/polyline
ctrl/cmd + mouse clickAdd point to and finish creating polygon/polyline
ctrl/cmd + delDelete selected element
delDelete selected element (only when text not editable)
minus + .Delete selected dragpoint
plus + .Add dragpoint next to selected
ctrl/cmd + 0View full document (initial zoom)
ctrl/cmd + 1Zoom to page of selected element or hovered
ctrl/cmd + 2Zoom to selected element or hovered
ctrl/cmd + plus/minusZoom in/out
shift + mouse wheelZoom in/out
ctrl/cmd + arrowsMove (pan) the image in the respective direction
mouse wheelMove (pan) the image in the respective direction
mouse dragMove (pan) the image in the respective direction
ctrl + ,Select the next edit mode element level
ctrl + shift + ,Select the previous edit mode element level
ctrl + .Select the next edit mode type
ctrl + shift + .Select the previous edit mode type
ctrl/cmd + eOpen property editor for selected element
ctrl/cmd + shift + pageupIncrease bottom pane text font size
ctrl/cmd + shift + pagedownDecrease bottom pane text font size
ctrl/cmd + pageupIncrease overlayed text font size
ctrl/cmd + pagedownDecrease overlayed text font size
ctrl/cmd + gChange gamma of image
alt + arrowsChange table selected cell in the respective direction
plus + c/rSplit selected table column/row
minus + c/rRemove selected table column/row
move dragpointModifies dragpoint and its opposite dragpoint(s)
shift+ move dragpointModifies only the dragpoint, not the opposite(s)
ctrl/cmd + rToggle selected element protection

Copyright

The MIT License (MIT)

GitHub

GitHub - mauvilsa/nw-page-editor: Simple app for visual editing of Page XML files
Simple app for visual editing of Page XML files. Contribute to mauvilsa/nw-page-editor development by creating an account on GitHub.