Download OpenAPI specification:Download
Use the Semantic Fingerprint API to create semantic fingerprints from any kind of text in 4 languages (English, Spanish, French, German). Semantic fingerprints are text embeddings that can be used to train language models and perform operations on text like search and classification efficiently.
This endpoint converts the input text into a semantic fingerprint. First, each word is converted into its fingerprint representation. Then these word representations are aggregated and sparsified to create the text fingerprint. Learn more about semantic fingerprints
retina_name required | string (Retina Name) The name of the retina |
use_phrase_fingerprints | boolean (Use Phrase Fingerprints) Default: true Whether to tokenize and fingerprint known phrases. If false, fingerprints of the single tokens will be used. |
pos_filter | boolean (Pos Filter) Default: true Removes terms not of POS type noun, verb or adj |
stop_filter | boolean (Stop Filter) Default: true Removes stopwords if enabled |
max_df | number (Max Df) Default: 0.1 Ignores terms with higher document frequency than specified threshold |
max_terms | integer (Max Terms) >= 0 Default: 50 Only uses n of the most useful terms |
pos_weighting | boolean (Pos Weighting) Default: true Weights nouns and proper nouns higher than adjectives and verbs if enabled. Other POS types are ignored |
tfidf_weighting | boolean (Tfidf Weighting) Default: true Weights terms by tf-idf if enabled |
inverse_position_weighting | boolean (Inverse Position Weighting) Default: false Weights positions by inverse frequency |
max_density | number (Max Density) [ 0 .. 1 ] Default: 0.02 Max density of the aggregated fingerprint |
text required | string (Text) |
{- "text": "string"
}
{- "fp_type": "binary",
- "representation": [
- null
]
}
Use the Natural Language Processing API to extract keywords, detect languages, compare documents,generate labels or segment long pieces of text.
This endpoint returns the list of supported languages. The two letter language codes can be used as input for other endpoints. The service supports semantic operations for these languages.
{- "supported_languages": [
- "en",
- "de"
]
}
This endpoint extracts the semantically most relevant words from a given text. The "number of keywords" parameter can ideally be selected in proportion to the length of the text (default is 10 terms).
limit | integer (Limit) >= 0 Default: 0 Maximum number of keywords to return. If unspecified or equal zero, an appropriate number will be automatically determined. |
text required | string (Text) .*\S.* The input text. Cannot be empty. |
language | string (Language) ^([a-z][a-z])?$ Language of the text, e.g. 'en' (ISO 639-1). If not provided, the service will try to infer it from the text. |
{- "text": "Cortical.io’s mission is to deliver AI-based solutions that streamline the extraction, classification, review and analysis of information hidden in unstructured text while providing short time to value. We accomplish this through our novel, meaning-based approach to natural language understanding that solves many critical challenges of text processing in a business context. With more than 10 years expertise in implementing intelligent document processing solutions in the enterprise, Cortical.io has demonstrated its ability to solve the challenges of language ambiguity and variability across many use cases and verticals and is trusted by major companies across the globe.",
- "language": "en"
}
{- "keywords": [
- {
- "word": "string",
- "document_frequency": 0,
- "pos_tags": [
- "NOUN"
], - "score": 0
}
], - "language": "en"
}
This endpoint computes the semantic similarity between two texts, and returns the value in range [0, 1].
text required | string (Text) .*\S.* The input text. Cannot be empty. |
language | string (Language) ^([a-z][a-z])?$ Language of the text, e.g. 'en' (ISO 639-1). If not provided, the service will try to infer it from the text. |
[- {
- "text": "organ",
- "language": "en"
}, - {
- "text": "piano",
- "language": "en"
}
]
{- "similarity": 0.7,
- "languages": [
- "en",
- "en"
]
}
This endpoint detects the language of a given text by using a fastText language detection model. The model is not precise on very short texts (a few words are needed). If the input text contains multiple languages, the endpoint will only return the language with the highest confidence level.
text required | string (Text) |
{- "text": "What language is this?"
}
{- "language": "en"
}
This endpoint identifies the most similar words for a submitted text. The text-fingerprint is compared to all known word-fingerprints and the words with the highest similarities are returned.
limit | integer (Limit) [ 1 .. 1000 ] Default: 10 Number of terms to return |
nouns_only | boolean (Nouns Only) Default: true Whether to return only nouns |
context | boolean (Context) Default: false Whether to use context removal in finding the labels |
text required | string (Text) .*\S.* The input text. Cannot be empty. |
language | string (Language) ^([a-z][a-z])?$ Language of the text, e.g. 'en' (ISO 639-1). If not provided, the service will try to infer it from the text. |
{- "text": "Cortical.io’s mission is to deliver AI-based solutions that streamline the extraction, classification, review and analysis of information hidden in unstructured text while providing short time to value. We accomplish this through our novel, meaning-based approach to natural language understanding that solves many critical challenges of text processing in a business context. With more than 10 years expertise in implementing intelligent document processing solutions in the enterprise, Cortical.io has demonstrated its ability to solve the challenges of language ambiguity and variability across many use cases and verticals and is trusted by major companies across the globe.",
- "language": "en"
}
{- "labels": [
- {
- "word": "string",
- "document_frequency": 0,
- "pos_tags": [
- "NOUN"
]
}
], - "language": "en"
}
This endpoint breaks down a given text into smaller segments, referred to as 'topical paragraphs'. These 'topical paragraphs' are computed by combining adjacent segments until a shift in linguistic cues or topics is identified.
max_results | integer (Max Results) Default: -1 Number of slices to return. Returns all segments if max_results < 0 |
text required | string (Text) .*\S.* The input text. Cannot be empty. |
language | string (Language) ^([a-z][a-z])?$ Language of the text, e.g. 'en' (ISO 639-1). If not provided, the service will try to infer it from the text. |
{- "text": "Tigers mostly feed on large and medium-sized mammals, particularly ungulates weighing 60–250 kg (130–550 lb). Range-wide, the most selected prey are sambar deer, Manchurian wapiti, barasingha and wild boar. Tigers are capable of taking down larger prey like adult gaur and wild water buffalo, but opportunistically eat much smaller prey, such as monkeys, peafowl and other ground-based birds, hares, porcupines and fish. They also prey on other predators, including dogs, leopards, bears, snakes and crocodiles. Tiger attacks on adult Asian elephants and Indian rhinoceros have also been reported. The Middle English tigre and Old English tigras derive from Old French tigre, from Latin tigris. This was a borrowing of Classical Greek 'tigris', a foreign borrowing of unknown origin meaning 'tiger' and the river Tigris. The origin may have been the Persian word tigra ('pointed or sharp') and the Avestan word tigrhi ('arrow'), perhaps referring to the speed of the tiger's leap, although these words are not known to have any meanings associated with tigers. The origin may have been the Persian word tigra ('pointed or sharp') and the Avestan word tigrhi ('arrow'), perhaps referring to the speed of the tiger's leap, although these words are not known to have any meanings associated with tigers. There are three other colour variants – white, golden and nearly stripeless snow white – that are now virtually non-existent in the wild due to the reduction of wild tiger populations, but continue in captive populations. The white tiger has white fur and sepia-brown stripes. The golden tiger has a pale golden pelage with a blond tone and reddish-brown stripes. The snow white tiger is a morph with extremely faint stripes and a pale reddish-brown ringed tail. Both snow white and golden tigers are homozygous for CORIN gene mutations.",
- "language": "en"
}
{- "segments": [
- "string"
], - "language": "en"
}