January 2024Looking back on 2023 - new changes to nopaque
Hello nopaque users!
First of all, the nopaque team would like to wish everyone a good start to 2024! We hope you found the time to relax over the winter break.
Now that the new year has come around and we’re all back in the office, we wanted to take the opportunity to tell you about the most important things we’ve worked on in nopaque in 2023 – things we’ve incorporated into our latest nopaque update as of late December 2023. You may have noticed some of them as you’ve returned to your projects on nopaque.
Changes to the Query Builder
The Query Builder has undergone changes to make it more intuitive to use and is now the standard option for creating queries. Individual elements of a query can now be easily modified and edited by clicking on them. An input marker shows your position in the inquiry and where new elements will be added. This and all other elements can be moved around via drag and drop. A new toggle button enables users to easily switch between the Query Builder and Expert Mode if they prefer to work with the plain Corpus Query Language (CQL) instead. This can be done in the middle of an existing query – existing chips will be “translated” into CQL. This also works the other way around – if you want to switch back, your query in CQL wll be parsed into chips. More details and instructions on how to use the new Query Builder can be found in the manual.
The most extensive changes to nopaque have taken place in the Social Area. We want nopaque to be a platform where researchers can connect with each other, so we’ve added some more features to make this possible. Users can now update their personal profiles to be publicly visible to others on nopaque, including a short “About me” section and options to share your website, organization, location, and add an avatar that others can see. It is also possible to share corpora with other researchers via share links, access invitations, or by setting corpus visibility to Public. Other users can only see the meta data of public corpora – further access can be granted upon request. The extent of access to these shared corpora is managed by assigning the roles of Viewer, Contributor, and Administrator. Viewers may only download the files. Contributors can download and edit files and their metadata as well as analyze and build the corpus. Administrators can manage users, followers and visibility, in addition to all of the above.
July 2023Visualization Update (beta) - new analysis features
we wanted to give you some news on updates we’re making to nopaque. Since we want to make it easier for users to grasp and work with different elements of their data, we’ve been working on adding some visualization features into the Corpus Analysis service. Currently, the two main modules, “Reader” and “Concordance” have been expanded with an additional “Static Visualizations” module, but there’s more to come!
With the Static Visualizations module, it’s now possible to view information about your corpus, such as the number of (unique) tokens, sentences, lemmata, corresponding information on individual texts, the distribution of these elements within your corpus, as well as searchable lists of word frequencies with stopwords that can be preset and modified. In the future, this area will be extended with more advanced visualization options.
We’ll keep you posted about further visualization updates. Until then, we hope the latest update improves your research experience with nopaque. And as always, if you have any ideas for nopaque or need assistance, don’t hesitate to contact us!
November 2022Contribution Update
users can now upload their own language models into nopaque. This is useful for working with different languages that are not available as standard in nopaque or if a user wants to work with a language model they have developed themselves. Tesseract models can be uploaded in .traineddata format; spaCy models can be uploaded in tar.gz format. We are also working on the option to upload models in .whl format in the future. Uploaded models can be found in the model list of the corresponding service and can be used immediately. Models can also be made public if you have a role of Contributor in nopaque.
Please note: The Contributor role must be requested from the nopaque admins if you would like to make a model public for all users.
April 2022April updates – more features, faster operation
in April 2022, we released an update improving many elements of nopaque. We rewrote a lot of our code, including a significant reworking of our backend code for more efficient use of our servers. We integrated a new service, updated the existing ones, and made some minor design improvements.
We may be a bit late with our spring cleaning, but we’ve tidied up our database system and deleted old, empty corpora, unconfirmed user accounts and unnecessary data fields.
By partnering with Transkribus, we’ve reached one of our long-term goals: to integrate a Handwritten Text Recognition (HTR) service into nopaque. The Transkribus HTR Pipeline service is implemented as a kind of proxied service where the work is split between us and Transkribus. That means we do the preprocessing, storage and postprocessing, while Transkribus handles the HTR itself.
One change we needed to make in the background was to fix our performance issues. While implementing the Transkribus HTR Pipeline service, we saw optimization potential in different steps of our processing routine. These optimizations are now also available in our Tesseract OCR Pipeline service and result in speeds that are about four times faster than before. We’re now finished with the major optimizations, but there could be more soon, so stay tuned!
Next, we reorganized our Corpus Analysis code. It was a bit messy, but after a complete rewrite, we are now able to query a corpus without long loading times and with better error handling, making the user experience much more stable. The Corpus Analysis service is now modularized and comes with two modules that recreate and extend the functionality of the old service.
The Query Result viewer had to be temporarily disabled, as the code was based on the old Corpus Analysis service. It will be reintegrated as a module to the Corpus Analysis.
The spaCy NLP Pipeline service was also taken care of with some smaller updates. This is important preliminary work for support of more models/languages missing the full set of linguistic features (lemma, ner, pos, simple_pos). It still needs some testing and adjustments but will be ready soon!
Last, but not least, we made some design changes. Now, you can find color in places that were previously in black and white. Nothing big, but the new colors can aid in identifying resources more efficiently.
Where is my job data?
We reached our storage limit at the beginning of the year. At this time, some users may have noticed system instability. Fortunately, we found a solution that avoided data loss by deleting some non-nopaque related data in our system (yes, we also do things other than nopaque). To avoid facing the same problem again, we had to find a long-term solution. In the end, this involved the deletion of all previous job data with this update and, going forward, only keeping new job data for three months after job creation (important note: corpora are not affected). All job data created prior to this update has been backed up for you. Feel free to contact us at email@example.com if you would like to get this data back.
September 2021nopaque's beta launch
Hello to all our users!
The BETA version of our web platform, nopaque, is now available! Nopaque is a web application that offers different services and tools to support researchers working with image and text-based data. These services include:
- File Setup, which converts and merges different data (e.g., books, letters) for further processing
- Optical Character Recognition, which converts photos and scans into text data for machine readability
- Natural Language Processing, which extracts information from your text via computational linguistic data processing (tokenization, lemmatization, part-of-speech tagging and named-entity recognition)
- Corpus analysis, which makes use of CQP Query Language to search through text corpora with the aid of metadata and Natural Language Processing tags.
Nopaque was created based on our experiences working with other subprojects and a Prototyp user study in the first phase of funding. The platform is open source under the terms of the MIT license (https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nopaque). Language support and functions are currently limited – extensions can be requested by sending an email to firstname.lastname@example.org. Because we are still in the beta phase, some bugs are to be expected. If you encounter any problems, please let us know! We are thankful for all feedback we receive.