User manual


nopaque is a web-based digital working environment. It implements a workflow based on the research process in the humanities and supports its users in processing their data in order to subsequently apply digital analysis methods to them. All processes are implemented in a specially provided cloud environment with established open source software. This always ensures that no personal data of the users is disclosed.

Registration and Log in

Registration and Log in

Before you can start using the web platform, you need to create a user account. This requires only a few details: just a user name, an e-mail address and a password are needed. In order to register yourself, fill out the form on the registration page. After successful registration, the created account must be verified. To do this, follow the instructions given in the automatically sent e-mail. Afterwards, you can log in as usual with your username/email address and password in the log-in form located next to the registration button.



The dashboard provides a central overview of all resources assigned to the user. These are corpora and created jobs. Corpora are freely composable annotated text collections and jobs are the initiated file processing procedures. Both the job and the corpus listings can be searched using the search field displayed above them.

I Corpus

A corpus is a collection of texts that can be analyzed using the Corpus Analysis service. All texts must be in the verticalized text file format, which can be obtained via the Natrual Language Processing service. It contains, in addition to the actual text, further annotations that are searchable in combination with optional addable metadata during your analysis.

J Job

A job is a construct that represents the execution of a service. It stores input files, output files, processing status, and options selected during creation. After submitting a job, you get redirected to a job overview page. This can be accessed again via the job list on the dashboard. Jobs will be deleted three months after creation, so we encourage you to download the results after a job is completed.



nopaque was designed from the ground up to be modular. This modularity means that the offered workflow provides variable entry and exit points, so that different starting points and goals can be flexibly addressed. Each of these modules are implemented in a self-contained service, each of which represents a step in the workflow. The services are coordinated in such a way that they can be used consecutively. The order can either be taken from the listing of the services in the left sidebar or from the roadmap (accessible via the pink compass in the upper right corner). All services are versioned, so the data generated with nopaque is always reproducible.

File Setup

The File Setup Service bundles image data, such as scans and photos, together in a handy PDF file. To use this service, use the job form to select the images to be bundled, choose the desired service version, and specify a title and description. Please note that the service sorts the images into the resulting PDF file based on the file names. So naming the images correctly is of great importance. It has proven to be a good practice to name the files according to the following scheme: page-01.png, page-02.jpg, page-03.tiff, etc. In general, you can assume that the images will be sorted in the order in which the file explorer of your operating system lists them when you view the files in a folder sorted in ascending order by file name.

Optical Character Recognition (OCR)

Comming soon...

Handwritten Text Recognition (HTR)

Comming soon...

Natural Language Processing (NLP)

Comming soon...

Corpus Analysis

With the corpus analysis service, it is possible to create a text corpus and then explore it in an analysis session. The analysis session is realized on the server side by the Open Corpus Workbench software, which enables efficient and complex searches with the help of the CQP Query Language.

A closer look at the Corpus Analysis

Create a corpus

Create a Corpus

To create a corpus, you can use the "New Corpus" button, which can be found on both, the Corpus Analysis Service page and the Dashboard below the corpus list. Fill in the input mask to Create a corpus. After you have completed the input mask, you will be automatically taken to the corpus overview page (which can be called up again via the corpus lists) of your new and accordingly still empty corpus.

Create a Corpus

Now you can add texts in vrt format (results of the NLP service) to your new corpus. To do this, use the "Add Corpus File" button and fill in the form that appears. You will get the possibility to add metadata to each text. After you have added all the desired texts to the corpus, the corpus must be prepared for the analysis, this process can be initiated by clicking on the "Build" button. On the corpus overview page you can always see information about the current status of the corpus in the upper right corner. After the build process the status should be "built".

Analyze a corpus

After you have created and built a corpus, it can be analyzed. To do this, use the button labeled Analyze. The corpus analysis currently offers two modules, the Reader and the Concordance module. The reader module can be used to read your tokenized corpus in different ways. You can select a token representation option, it determines the property of a token to be shown. You can for example read your text completly lemmatized. You can also change the way of how a token is displayed, by using the text style switch. The concordance module offers some more options regarding the context size of search results. If the context does not provide enough information you can hop into the reader module by using the lupe icon next to a match.

CQP Query Language

Within the Corpus Query Language, a distinction is made between two types of annotations: positional attributes and structural attributes. Positional attributes refer to a token, e.g. the word "book" is assigned the part-of-speech tag "NN", the lemma "book" and the simplified part-of-speech tag "NOUN" within the token structure. Structural attributes refer to text structure-giving elements such as sentence and entity markup. For example, the markup of a sentence is represented in the background as follows:

    <s>                                     structural attribute
    word    pos    lemma    simple_pos      positional attribute
    <ent type="PERSON">                     structural attribute
    word    pos    lemma    simple_pos      positional attribute
    word    pos    lemma    simple_pos      positional attribute
    </ent>                                  structural attribute
    word    pos    lemma    simple_pos      positional attribute
    </s>                                    structural attribute

Positional attributes

Before you can start searching for positional attributes (also called tokens), it is necessary to know what properties they contain.

  1. word: The string as it is also found in the original text
  2. pos: A code for the word type, also called POS tag
    1. IMPORTANT: POS tags are language-dependent to best reflect language-specific properties.
    2. The codes (= tagsets) can be taken from the Corpus Analysis Concordance page.
  3. lemma: The lemmatized representation of the word
  4. simple_pos: A simplified code for the word type that covers fewer categories than the pos property, but is the same across languages.
    1. The codes (= tagsets) can be taken from the Corpus Analysis Concordance page.

Searching for positional attributes

Token with no condition on any property (also called wildcard token)

[];                            Each token matches this pattern

Token with a condition on its word property

[word="begin"];                “begin”
[word="begin" %c];             same as above but ignores case

Token with a condition on its lemma property

[lemma="begin"];               “begin”, “began”, “beginning”, …
[lemma="begin" %c];            same as above but ignores case

Token with a condition on its simple_pos property

[simple_pos="VERB"];           “begin”, “began”, “beginning”, …

Token with a condition on its pos property

[pos="VBG"];                   “begin”, “began”, “beginning”, …

Look for words with a variable character (also called wildcard character)

[word="beg.n"];                “begin”, “began”, “begun”
          ^ the dot represents the wildcard character

Token with two conditions on its properties, where both must be fulfilled (AND operation)

[lemma="be" & simple_pos="VERB"];          Lemma “be” and simple_pos is Verb
            ^ the ampersand represents the and operation

Token with two conditions on its properties, where at least one must be fulfilled (OR operation)

[simple_pos="VERB" | simple_pos="ADJ"];    simple_pos VERB or simple_pos ADJ (adjective)
                   ^ the line represents the or operation


[simple_pos="NOUN"] [simple_pos="VERB"];       NOUN -> VERB
[simple_pos="NOUN"] [] [simple_pos="VERB"];    NOUN -> wildcard token -> VERB

Incidence modifiers
Incidence Modifiers are special characters or patterns, that control how often a character/token that stands in front of it should occur.

  1. +: One or more occurrences of the character/token before
  2. *: Zero or more occurrences of the character/token before
  3. ?: Zero or one occurrences of the character/token before
  4. {n}: Exactly n occurrences of the character/token before
  5. {n,m}: Between n and m occurrences of the character/token before
[word="beg.+"];        “begging”, “begin”, “began”, “begun”, …
[word="beg.*"];        “beg”, “begging”, “begin”, “begun”, …
[word="beg?"];         “be”, “beg”
[word="beg.{2}"];      “begin”, “begun”, …
[word="beg.{2,4}"];    “begging”, “begin”, “begun”, …
[word="beg{2}.*"];     “begged”, “beggar”, …
[simple_pos="NOUN"] []? [simple_pos="VERB"];    NOUN -> wildcard token (x0 or x1) -> VERB
[simple_pos="NOUN"] []* [simple_pos="VERB"];    NOUN -> wildcard token (x0 or x1) -> VERB

Option groups
Find character sequences from a list of options.

[word="be(g|gin|gan|gun)"];			“beg”, “begin”, “began”, “begun”
         ^             ^ the braces indicate the start and end of an option group

Structural attributes

nopaque provides several structural attributes for query. A distinction is made between attributes with and without value.

  1. s: Annotates a sentence
  2. ent: Annotates an entity
    1. *ent_type: Annotates an entity and has as value a code that identifies the type of the entity.
      1. The codes (= tagsets) can be taken from the Corpus Analysis Concordance page.
  3. text: Annotates a text
    1. Note that all the following attributes have the data entered during the corpus creation as value.
    2. *text_address
    3. *text_author
    4. *text_booktitle
    5. *text_chapter
    6. *text_editor
    7. *text_institution
    8. *text_journal
    9. *text_pages
    10. *text_publisher
    11. *text_publishing_year
    12. *text_school
    13. *text_title

Searching for structural attributes

<ent> [] </ent>;                       A one token long entity of any type
<ent_type="PERSON"> [] </ent_type>;     A one token long entity of type PERSON
<ent_type="PERSON"> []* </ent_type>;    Entity of any length of type PERSON
<ent_type="PERSON"> []* </ent_type> []* [simple_pos="VERB"] :: match.text_publishing_year="1991";
Arbitrarily long entity of type PERSON -> Arbitrarily many tokens -> VERB but only within texts with publication year 1991