SDKs over message broker

nlpsdk.measurement module

measurement_detection(text, delay=15)

Detect measurement numbers present in given text given as one, two, or multi dimension.

Parameters:
  • text (str) – The text in which measurement has to be detected.
  • delay (int, optional) – The maximum wait time for async call to return.
Returns:

A list of dictionaries corresponding to each named entity.

keys of dictionaries:
  • type: Measurement type - Stable dim, relative dim, past dim, one/two/any dim.
  • text: Measurement value.
  • Begin: Entity start position in input text.
  • End: Entity end position in input text.

Example

>>> from nlpsdk.measurement import measurement_detection
>>> text = '''Additional findings on attenuation correction CT scan: There are several scattered
... non-enlarged cervical nodes bilaterally, as mentioned above, with overall interval
... stability as compared to 09-Dec-2008.Attenuation correction head and neck CT is
... otherwise unremarkable.Patient is status post thyroidectomy. No enlarged mediastinal
... nodes are seen. Numerous small nodules are seen scattered throughout both lungs.
... There has been interval increase in size of some of these nodules, some examples
... include: RLL nodule (4: 74), RUL nodules (4: 50, 51). No new nodule was identified.Heart
... is normal in size.Main pulmonary artery is distended at 33 x 34 x 23 mm, as before.No pleural
... or pericardial effusion.Normal non contrast appearance of the liver, spleen, pancreas,
... adrenal glands, and left kidney. 14 x 19 mm lesion with fat density in the right kidney,
... likely angiomyolipoma. 13 x 23, and 17 x 19 x 23 mm (3: 119, 153) hypodense lesions in the
... right kidney. Further characterization is not possible on this limited CT. Statistically,
... these are most likely to represent cysts. Ultrasound recommended at clinical discretion.
... No enlarged retroperitoneal or mesenteric lymph nodes is seen. No free fluid. Retained
... oral contrast is seen in the colon, otherwise normal-appearing bone on non contrast CT.
... Normal appearance of the urinary bladder. Normal noncontrast appearance of the uterus and
... left adnexa. 37 x 39 mm soft tissue density seen to the right side of uterus, stable.
... This could represent a pedunculated myoma, or an adnexal lesion.  Pelvic ultrasound is
... suggested at clinical discretion. No suspicious bony lesion of length 5mm is identified.'''
>>> output_measure = measurement_detection(text)
>>> print(output_measure)
{
"status": "OK",
"language": "English",
"entities": [
{
"type": "three_dim",
"text": "33 x 34 x 23 mm",
"Begin": 699,
"End": "714"
},
{
"type": "two_dim",
"text": "14 x 19 mm",
"Begin": 875,
"End": "885"
},
{
"type": "two_dim",
"text": "13 x 23",
"Begin": 962,
"End": "969"
},
{
"type": "three_dim",
"text": "17 x 19 x 23 mm",
"Begin": 975,
"End": "990"
},
{
"type": "Present",
"text": "37 x 39 mm soft tissue density seen",
"Begin": 1538,
"End": "1573"
},
{
"type": "Present",
"text": "5mm is identified",
"Begin": 1786,
"End": "1803"
}
]
}

nlpsdk.negation module

negation_detection(text, target_list, delay=15)

Given a target entity, detect whether it has been negated or not. It also describes the type of negation whether it is definite_negated or mild, etc...

Parameters:
  • text (string) – The sentence(s) where negated word and entity is present.
  • target_list (string) – The set of target entities separated with a comma.
  • delay (int, optional) – The maximum wait time for async call to return.
Returns:

A list of dictionaries corresponding to each negated entity.

keys of dictionaries:
  • type: user specified entity or the system specified entity.
  • sentiment: sentiment has three attributes namely type - type of negation (Definite_negated, probable negated, mild,etc), Direction - the direction in which it was negated (Forward, backward, Bidirectional) and Phrase (the negated word present in given text).
  • text: specified Target Entity.
  • Begin: entity start position in input text.
  • End: entity end position in input text.

Example

>>> from nlpsdk.negation import negation_detection
>>> text = "There is no lung cancer detected."
>>> target_list = "lung cancer"
>>> output_negation = negation_detection(text, target_list)
>>> print(output_negation)
{
"status": "OK",
"language": "English",
"entities": [
{
"type": "user specified target entity",
"sentiment": {
"type": "definite_negated_existence",
"Direction": "forward",
"Phrase": "no"
},
"text": "lung cancer",
"Begin": 12,
"End": 23
}
]
}

nlpsdk.taggers module

anatomy_tagger(tokenized_words, delay=30)

Find anatomies present in the list of given tokenized words.

Anatomy Tagger is a subtask of Information Extraction that extracts anatomies from the given report.

Parameters:
  • tokenized_words (list) – List of tokenized words. If text has multi sentences, then it should be tokenized into sentences and then into words.
  • delay (int, optional) – The maximum wait time for async call to return.
Returns:

A nested list that appends Anatomy tag(BA/IA/O) to the given list of tokenized words.

Acronym Definitions of tags are:
  • BA: Begin Anatomy.
  • IA: Intermediate Anatomy.
  • O: Others.

Example

>>> from nlpsdk.taggers import anatomy_tagger
>>> from nlpsdk.tokenizers import word_tokenizer
>>> from nlpsdk.tokenizers import sentence_tokenizer
>>> text = "Heart is functioning normal. Lungs are normal"
>>> #tokenizing the report or document into sentences.
>>> sentences = sentence_tokenizer(text,"nltk")
>>> print(sentences)
[u'Heart is functioning normal.', u'Lungs are normal']
>>> #tokenizing the sentences into words.
>>> words = []
>>> for sentence in sentences:
>>>     words.append(word_tokenizer(sentence,"nltk"))
>>> print(words)
[[u'Heart', u'is', u'functioning', u'normal', u'.'], [u'Lungs', u'are', u'normal']]
>>> #Predicting the anatomies present in the document.
>>> anatomies = anatomy_tagger(words)
>>> print(anatomies)
[[[u'Heart', u'BA'], [u'is', u'O'], [u'functioning', u'O'], [u'normal', u'O'], [u'.', u'O']], [[u'Lungs', u'BA'], [u'are', u'O'], [u'normal', u'O']]]
ner_tagger(text, model, delay=15)

Identify entities such as Anatomies, Findings etc...

Parameters:
  • text (string) – The text to be tagged.
  • model (string) – The model to be used for tagging. Currently supports ctakes and crf.
  • delay (int, optional) – The maximum wait time for async call to return.
Returns:

A list of dictionaries corresponding to each named entity.

keys of dictionary:
  • sentiment: Sentiment of the finding (Named entity).
  • Begin: Entity start position in input text.
  • End: Entity end position in input text.
  • text: Named entity.
  • type: Named entity type.

Example

>>> from nlpsdk.taggers import ner_tagger
>>> txt = "there is a small amount of blood layering in the
... occipital horns of both lateral ventricles, unchanged though not
... as dense given evolution."
>>> ner = ner_tagger(txt,"ctakes")
>>> print(ner)
[{u'Begin': 28,
u'End': 33,
u'sentiment': {u'type': u'Normal'},
u'text': u'blood',
u'type': u'Anatomy'},
{u'Begin': 60,
u'End': 65,
u'sentiment': {u'type': u'Normal'},
u'text': u'horns',
u'type': u'Anatomy'},
{u'Begin': 74,
u'End': 81,
u'sentiment': {u'type': u'Normal'},
u'text': u'lateral',
u'type': u'Location'},
{u'Begin': 74,
u'End': 92,
u'sentiment': {u'type': u'Normal'},
u'text': u'lateral ventricles',
u'type': u'Anatomy'},
{u'Begin': 94,
u'End': 103,
u'sentiment': {u'type': u'Negative'},
u'text': u'unchanged',
u'type': u'NegativeFinding'}]
pos_tagger(sentence, model, delay=30)

Part-Of-Speech Tagger (POS Tagger) identifies and tags parts of speech to each word such as noun, verb, adjective, etc...

Parameters:
  • sentence (string/list) – string for ctakes model and list of words for nltk and lstm models. Specifies text to be tagged into individual words with pos tags.
  • model (string) – The model to be used for tagging. Currently supports nltk, lstm and ctakes.
  • delay (int, optional) – The maximum wait time for async call to return.
Returns:

A list of words with pos taggers for each word in it.

Example

>>> from nlpsdk.taggers import pos_tagger
>>> from nlpsdk.tokenizers import word_tokenizer
>>> txt = "there is a small amount of blood layering in the
... occipital horns of both lateral ventricles, unchanged though not
... as dense given evolution."
>>> words = word_tokenizer(txt, "nltk") # here input object is of type `list`
>>> print(words, len(words))
[u'there', u'is', u'a', u'small', u'amount', u'of', u'blood', u'layering', u'in', u'the', u'occipital',
u'horns', u'of', u'both', u'lateral', u'ventricles', u',', u'unchanged', u'though', u'not', u'as', u'dense',
u'given', u'evolution', u'.'] 25
>>> # for nltk model
>>> pos = pos_tagger(words,"nltk")
>>> print(pos)
[[u'there', u'EX'],
[u'is', u'VBZ'],
[u'a', u'DT'],
[u'small', u'JJ'],
[u'amount', u'NN'],
[u'of', u'IN'],
[u'blood', u'NN'],
[u'layering', u'NN'],
[u'in', u'IN'],
[u'the', u'DT'],
[u'occipital', u'JJ'],
[u'horns', u'NNS'],
[u'of', u'IN'],
[u'both', u'DT'],
[u'lateral', u'JJ'],
[u'ventricles', u'NNS'],
[u',', u','],
[u'unchanged', u'JJ'],
[u'though', u'RB'],
[u'not', u'RB'],
[u'as', u'IN'],
[u'dense', u'NN'],
[u'given', u'VBN'],
[u'evolution', u'NN'],
[u'.', u'.']]
>>> #for ctakes model
>>> pos = pos_tagger(txt,"ctakes") # here input object is of type `str`
>>> print(pos)
[[u'there', u'EX'],
[u'is', u'VBZ'],
[u'a', u'DT'],
[u'small', u'JJ'],
[u'amount', u'NN'],
[u'of', u'IN'],
[u'blood', u'NN'],
[u'layering', u'NN'],
[u'in', u'IN'],
[u'the', u'CD'],
[u'occipital', u'JJ'],
[u'horns', u'NNS'],
[u'of', u'IN'],
[u'both', u'DT'],
[u'lateral', u'JJ'],
[u'ventricles', u'NNS'],
[u',', u','],
[u'unchanged', u'JJ'],
[u'though', u'IN'],
[u'not', u'RB'],
[u'as', u'IN'],
[u'dense', u'JJ'],
[u'given', u'VBN'],
[u'evolution', u'NN'],
[u'.', u'.']]
>>> #for lstm model
>>> pos = pos_tagger(words,"lstm") # here input object is of type `list`
>>> print(pos)
[[[u'there', u'EX'],
[u'is', u'VBZ'],
[u'a', u'DT'],
[u'small', u'JJ'],
[u'amount', u'NN'],
[u'of', u'IN'],
[u'blood', u'NN'],
[u'UNK', u'NN'],
[u'in', u'IN'],
[u'the', u'DT'],
[u'UNK', u'JJ'],
[u'UNK', u'NN'],
[u'of', u'IN'],
[u'both', u'CC'],
[u'UNK', u'JJ'],
[u'UNK', u'NN'],
[u',', u','],
[u'unchanged', u'JJ'],
[u'though', u'IN'],
[u'not', u'RB'],
[u'as', u'IN'],
[u'dense', u'JJ'],
[u'given', u'VBN'],
[u'evolution', u'NN'],
[u'.', u'.']]]

nlpsdk.tokenizers module

section_tokenizer(text, model, delay=15)

Tokenizes a given text into a list of sections.

Parameters:
  • text (string) – The text to be tokenized.
  • model (string) – The model to be used for tokenization. Currently supports dsp.
  • delay (int, optional) – The maximum wait time for async call to return.
Returns:

A list with sentences as its elements.

Return type:

str

Example

>>> from nlpsdk.tokenizers import section_tokenizer
>>> txt = ''' right shoulder x-ray ([**2162-3-11**])
...     impression    :    aspiration of thin liquids and puree
...     for additional information, please see the speech and swallow therapist's report from the same day.
...
...     findings : there is a small amount of blood layering in the
...     occipital horns of both lateral ventricles, unchanged though not
...     as dense given evolution. no new hemorrhage is identified. the
...     ventricles, cisterns, and sulci are enlarged secondary to
...     involutional change. periventricular white matter hyperdensities
...     are sequelae of chronic small vessel ischemia. encephalomalacia
...     in the left cerebell    ar hemisphere secondary to old infarction is
...     unchanged. the osseous structures are unremarkable. the
...     visualized paranasal sinuses and mastoid air cells are clear.
...     skin staples are noted along the superior- posterior neck
...     secondary to recent spinal surgery.'''
>>> sections = section_tokenizer(txt, model='dsp')
>>> print(sections)
[[u'impression', u'findings'],
 [u"    aspiration of thin liquids and puree for additional information,
  please see the speech and swallow therapist's report from the same day.",
  u' there is a small amount of blood layering in the occipital
  horns of both lateral ventricles, unchanged though not as dense
  given evolution. no new hemorrhage is identified. the ventricles,
  cisterns, and sulci are enlarged secondary to involutional change.
  periventricular white matter hyperdensities are sequelae of chronic
  small vessel ischemia. encephalomalacia in the left cerebell
  ar hemisphere secondary to old infarction is unchanged. the osseous
  structures are unremarkable. the visualized paranasal sinuses and
  mastoid air cells are clear. skin staples are noted along the
  superior- posterior neck secondary to recent spinal surgery.']]
sentence_tokenizer(text, model, delay=15)

Tokenizes a given text into a list of sentences.

Parameters:
  • text (string) – The ext to be tokenized.
  • model (string) – The model to be used for tokenization. Currently supports nltk and ctakes.
  • delay (int, optional) – The maximum wait time for async call to return.
Returns:

A list with sentences as its elements.

Return type:

str

Example

>>> from nlpsdk.tokenizers import sentence_tokenizer
>>> txt = ''' There is a small amount of blood layering in the
...     occipital horns of both lateral ventricles, unchanged though not
...     as dense given evolution. no new hemorrhage is identified. the
...     ventricles, cisterns, and sulci are enlarged secondary to
...     involutional change. periventricular white matter hyperdensities
...     are sequelae of chronic small vessel ischemia. encephalomalacia
...     in the left cerebellar hemisphere secondary to old infarction is
...     unchanged. the osseous structures are unremarkable. the
...     visualized paranasal sinuses and mastoid air cells are clear.
...     skin staples are noted along the superior- posterior neck
...     secondary to recent spinal surgery.'''
>>> # for nltk model
>>> sentences = sentence_tokenizer(txt, 'nltk')
>>> print(sentences)
[u'there is a small amount of blood layering in the occipital horns of both lateral ventricles, unchanged though not as dense given evolution. no new hemorrhage is identified. the ventricles, cisterns, and sulci are enlarged secondary to involutional change.',
u'periventricular white matter hyperdensities are sequelae of chronic small vessel ischemia.',
u'encephalomalacia in the left cerebellar hemisphere secondary to old infarction is unchanged.',
u'the osseous structures are unremarkable.',
u'the visualized paranasal sinuses and mastoid air cells are clear.',
u'skin staples are noted along the superior- posterior neck secondary to recent spinal surgery.']
>>> # for ctakes model
>>> sentences = sentence_tokenizer(txt, 'ctakes')
>>> print(sentences)
[u'there is a small amount of blood layering in the occipital horns of both lateral ventricles, unchanged though not as dense given evolution. no new hemorrhage is identified. the ventricles, cisterns, and sulci are enlarged secondary to involutional change.',
u'periventricular white matter hyperdensities are sequelae of chronic small vessel ischemia.',
u'encephalomalacia in the left cerebellar hemisphere secondary to old infarction is unchanged.',
u'the osseous structures are unremarkable.',
u'the visualized paranasal sinuses and mastoid air cells are clear.',
u'skin staples are noted along the superior- posterior neck secondary to recent spinal surgery.']
word_tokenizer(sentence, model, delay=15)

Tokenizes a given sentence into a list of words.

Parameters:
  • sentence (string) – The text to be tokenized.
  • model (string) – The model to be used for tokenization. Currently supports nltk and ctakes.
  • delay (int, optional) – The maximum wait time for async call to return.
Returns:

A list of words as its elements.

Return type:

str

Example

>>> from nlpsdk.tokenizers import word_tokenizer
>>> txt = '''there is a small amount of blood layering in the
...     occipital horns of both lateral ventricles, unchanged
...     though not as dense given evolution.'''
>>> # for nltk model
>>> words = word_tokenizer(txt, "nltk")
>>> print(words)
[u'there',
 u'is',
 u'a',
 u'small',
 u'amount',
 u'of',
 u'blood',
 u'layering',
 u'in',
 u'the',
 u'occipital',
 u'horns',
 u'of',
 u'both',
 u'lateral',
 u'ventricles',
 u',',
 u'unchanged',
 u'though',
 u'not',
 u'as',
 u'dense',
 u'given',
 u'evolution',
 u'.']
>>> # for ctakes model
>>> words = word_tokenizer(txt, "ctakes")
>>> print(words)
[u'there',
u'is',
u'a',
u'small',
u'amount',
u'of',
u'blood',
u'layering',
u'in',
u'the',
u'occipital',
u'horns',
u'of',
u'both',
u'lateral',
u'ventricles',
u',',
u'unchanged',
u'though',
u'not',
u'as',
u'dense',
u'given',
u'evolution',
u'.']