Semantic Fingerprint API and Natural Language Processing API

Download OpenAPI specification:Download

Fingerprint

Use the Semantic Fingerprint API to create semantic fingerprints from any kind of text in 4 languages (English, Spanish, French, German). Semantic fingerprints are text embeddings that can be used to train language models and perform operations on text like search and classification efficiently.

Returns the semantic fingerprint of the given text.

This endpoint converts the input text into a semantic fingerprint. First, each word is converted into its fingerprint representation. Then these word representations are aggregated and sparsified to create the text fingerprint. Learn more about semantic fingerprints

query Parameters
retina_name
required
string (Retina Name)

The name of the retina

use_phrase_fingerprints
boolean (Use Phrase Fingerprints)
Default: true

Whether to tokenize and fingerprint known phrases. If false, fingerprints of the single tokens will be used.

pos_filter
boolean (Pos Filter)
Default: true

Removes terms not of POS type noun, verb or adj

stop_filter
boolean (Stop Filter)
Default: true

Removes stopwords if enabled

max_df
number (Max Df)
Default: 0.1

Ignores terms with higher document frequency than specified threshold

max_terms
integer (Max Terms) >= 0
Default: 50

Only uses n of the most useful terms

pos_weighting
boolean (Pos Weighting)
Default: true

Weights nouns and proper nouns higher than adjectives and verbs if enabled. Other POS types are ignored

tfidf_weighting
boolean (Tfidf Weighting)
Default: true

Weights terms by tf-idf if enabled

inverse_position_weighting
boolean (Inverse Position Weighting)
Default: false

Weights positions by inverse frequency

max_density
number (Max Density) [ 0 .. 1 ]
Default: 0.02

Max density of the aggregated fingerprint

Request Body schema: application/json
required
text
required
string (Text)

Responses

Request samples

Content type
application/json
{
  • "text": "string"
}

Response samples

Content type
application/json
{
  • "fp_type": "binary",
  • "representation": [
    ]
}

Texts

Use the Natural Language Processing API to extract keywords, detect languages, compare documents,generate labels or segment long pieces of text.

Supported Languages

This endpoint returns the list of supported languages. The two letter language codes can be used as input for other endpoints. The service supports semantic operations for these languages.

Responses

Response samples

Content type
application/json
{
  • "supported_languages": [
    ]
}

Keywords Extraction

This endpoint extracts the semantically most relevant words from a given text. The "number of keywords" parameter can ideally be selected in proportion to the length of the text (default is 10 terms).

query Parameters
limit
integer (Limit) >= 0
Default: 0

Maximum number of keywords to return. If unspecified or equal zero, an appropriate number will be automatically determined.

Request Body schema: application/json
required
text
required
string (Text) .*\S.*

The input text. Cannot be empty.

language
string (Language) ^([a-z][a-z])?$

Language of the text, e.g. 'en' (ISO 639-1). If not provided, the service will try to infer it from the text.

Responses

Request samples

Content type
application/json
{
  • "text": "Cortical.io’s mission is to deliver AI-based solutions that streamline the extraction, classification, review and analysis of information hidden in unstructured text while providing short time to value. We accomplish this through our novel, meaning-based approach to natural language understanding that solves many critical challenges of text processing in a business context. With more than 10 years expertise in implementing intelligent document processing solutions in the enterprise, Cortical.io has demonstrated its ability to solve the challenges of language ambiguity and variability across many use cases and verticals and is trusted by major companies across the globe.",
  • "language": "en"
}

Response samples

Content type
application/json
{
  • "keywords": [
    ],
  • "language": "en"
}

Semantic Similarity

This endpoint computes the semantic similarity between two texts, and returns the value in range [0, 1].

Request Body schema: application/json
required
Array
text
required
string (Text) .*\S.*

The input text. Cannot be empty.

language
string (Language) ^([a-z][a-z])?$

Language of the text, e.g. 'en' (ISO 639-1). If not provided, the service will try to infer it from the text.

Responses

Request samples

Content type
application/json
[
  • {
    },
  • {
    }
]

Response samples

Content type
application/json
{
  • "similarity": 0.7,
  • "languages": [
    ]
}

Detect Language

This endpoint detects the language of a given text by using a fastText language detection model. The model is not precise on very short texts (a few words are needed). If the input text contains multiple languages, the endpoint will only return the language with the highest confidence level.

Request Body schema: application/json
required
text
required
string (Text)

Responses

Request samples

Content type
application/json
{
  • "text": "What language is this?"
}

Response samples

Content type
application/json
{
  • "language": "en"
}

Similar Terms

This endpoint identifies the most similar words for a submitted text. The text-fingerprint is compared to all known word-fingerprints and the words with the highest similarities are returned.

query Parameters
limit
integer (Limit) [ 1 .. 1000 ]
Default: 10

Number of terms to return

nouns_only
boolean (Nouns Only)
Default: true

Whether to return only nouns

context
boolean (Context)
Default: false

Whether to use context removal in finding the labels

Request Body schema: application/json
required
text
required
string (Text) .*\S.*

The input text. Cannot be empty.

language
string (Language) ^([a-z][a-z])?$

Language of the text, e.g. 'en' (ISO 639-1). If not provided, the service will try to infer it from the text.

Responses

Request samples

Content type
application/json
{
  • "text": "Cortical.io’s mission is to deliver AI-based solutions that streamline the extraction, classification, review and analysis of information hidden in unstructured text while providing short time to value. We accomplish this through our novel, meaning-based approach to natural language understanding that solves many critical challenges of text processing in a business context. With more than 10 years expertise in implementing intelligent document processing solutions in the enterprise, Cortical.io has demonstrated its ability to solve the challenges of language ambiguity and variability across many use cases and verticals and is trusted by major companies across the globe.",
  • "language": "en"
}

Response samples

Content type
application/json
{
  • "labels": [
    ],
  • "language": "en"
}

Text Segmentation

This endpoint breaks down a given text into smaller segments, referred to as 'topical paragraphs'. These 'topical paragraphs' are computed by combining adjacent segments until a shift in linguistic cues or topics is identified.

query Parameters
max_results
integer (Max Results)
Default: -1

Number of slices to return. Returns all segments if max_results < 0

Request Body schema: application/json
required
text
required
string (Text) .*\S.*

The input text. Cannot be empty.

language
string (Language) ^([a-z][a-z])?$

Language of the text, e.g. 'en' (ISO 639-1). If not provided, the service will try to infer it from the text.

Responses

Request samples

Content type
application/json
{
  • "text": "Tigers mostly feed on large and medium-sized mammals, particularly ungulates weighing 60–250 kg (130–550 lb). Range-wide, the most selected prey are sambar deer, Manchurian wapiti, barasingha and wild boar. Tigers are capable of taking down larger prey like adult gaur and wild water buffalo, but opportunistically eat much smaller prey, such as monkeys, peafowl and other ground-based birds, hares, porcupines and fish. They also prey on other predators, including dogs, leopards, bears, snakes and crocodiles. Tiger attacks on adult Asian elephants and Indian rhinoceros have also been reported. The Middle English tigre and Old English tigras derive from Old French tigre, from Latin tigris. This was a borrowing of Classical Greek 'tigris', a foreign borrowing of unknown origin meaning 'tiger' and the river Tigris. The origin may have been the Persian word tigra ('pointed or sharp') and the Avestan word tigrhi ('arrow'), perhaps referring to the speed of the tiger's leap, although these words are not known to have any meanings associated with tigers. The origin may have been the Persian word tigra ('pointed or sharp') and the Avestan word tigrhi ('arrow'), perhaps referring to the speed of the tiger's leap, although these words are not known to have any meanings associated with tigers. There are three other colour variants – white, golden and nearly stripeless snow white – that are now virtually non-existent in the wild due to the reduction of wild tiger populations, but continue in captive populations. The white tiger has white fur and sepia-brown stripes. The golden tiger has a pale golden pelage with a blond tone and reddish-brown stripes. The snow white tiger is a morph with extremely faint stripes and a pale reddish-brown ringed tail. Both snow white and golden tigers are homozygous for CORIN gene mutations.",
  • "language": "en"
}

Response samples

Content type
application/json
{
  • "segments": [
    ],
  • "language": "en"
}