Thunken Thunken

Data Scientists/Science Dataists


Thunken is an independent data science nonprofit. We specialize in analyzing research data, including scholarly publications, patents, clinical trials, and grey literature. We monitor research outputs and the attention they receive, and we root for the underdogs.

Our main project is Cobaltmetrics. We also offer consulting services in data science, text mining, machine learning, and information architecture.

We believe that technology can foster diversity, and we have joined both the Coalition for Diversity and Inclusion in Scholarly Communication and the Helsinki Initiative on Multilingualism in Scholarly Communication.


With Cobaltmetrics, we are on a mission to make altmetrics genuinely alternative.

We collect data from and about underrepresented communities in the scholarly web: communities who use languages other than English, who cannot rely solely on persistent identifiers, who use new or non-mainstream publication venues, etc. We go deeper than backlink databases and altmetrics aggregators to help you report on all types of content.

Cobaltmetrics connects hyperlinks and persistent identifiers to uncover more citations and pay off your FAIRness debt. The web is our corpus, we support 300+ languages, and our API collates citations to all known versions and copies of a document.

Consulting Services

Software Development

Turn your ideas into reality. We can help with all stages of your project: prototyping, data collection, performance improvements, benchmarking, etc.

Machine Learning

Turn your observations into models. We help you work with unstructured data using methods like document labeling, semantic search, topic extraction, etc.

Business Intelligence

Turn your data into actionable insights. We help you use text mining and data science to structure and visualize your data, and to automate your processes.

Other Areas of Expertise

We provide on demand business advice. See our profile on Clarity for more information.


Luc Boruta

Director of Research

Ph.D. in computational linguistics, natural language processor, interested in linked data and linguistic diversity. In previous lives, Luc played whac-a-phoneme in a top-notch research lab in Paris, and he worked on the development of intelligent personal assistants and knowledge navigators in Montréal. He eats metadata for breakfast.

Damien Vannson

Director of Technology

Builder at heart, driven by the satisfaction of turning shower thoughts and back-of-the-envelope plans into full-fledged, user-friendly applications. After working on various web applications in Europe and the US, including building one of the world's largest scholarly databases, Damien is now overseeing technological developments at Thunken.


We build and maintain crawlers, search engines, and text mining tools for various clients across Europe and North America.

Since 2020, we are working with the University of Freiburg on RMTMO-RI. We built an online database of public and private research infrastructures in the Upper Rhine region.

In 2019, we worked with the European Science Foundation on MERIL. We helped the team review and finalize the database, and we drafted a sustainability plan that includes the creation of persistent identifiers for research infrastructures.

Since 2018, we are working with INSEEC U to audit and upgrade the information technology infrastructure of their business schools.

We advised Lettria from 2018 to 2019 on their work to build a full-fledged natural language processing API for French.

Nanomolar outsourced their full-stack development to us from 2017 to 2020. We worked on all technical aspects of the project, from web development to text mining on patents and technology transfer documents.

MyScienceWork outsourced their R&D to us from 2017 to 2018. We helped their team deduplicate and ingest tens of millions of documents into their databases, and we built various text mining tools to analyze scholarly publications and patents.

We have also worked with technology companies such as eRowz and LakePharma, and other clients via Clarity.

Contact Us

Ask us how we can help! Email us at [email protected] to inquire about your next project with us, or just to say hello.

We are also active on Clarity, GitHub, and LinkedIn.

Email is our preferred method of communication, but snail mail can be addressed to: Thunken ℅ Luc Boruta, 15 rue Lamarck, 31400 Toulouse, France.