My BioNLP Day

Sep 26

Status

Back from the depths, where I presented and delivered two presentations over the course of two weeks and the kind of immersion I require when I do that….

SHARP:

SuCCESS:

Other:

Aug 30

Clarity Personal Hx Query

Got an e-mail today with a good query for getting personal history out of Clarity.  A colleague put this together.

Look at use in medical hx of ICD9 codes with ‘colon’ in any of the diganosis names for that code This includes a record for each time a person’s medical hx is updated
This is a rough proxy until a detailed code list is created






SELECT  dx.CURRENT_ICD9_LIST 


     , max(dx.DX_NAME)  —dedupe by choosing arbitrary dx_name for the icd code       , count(*) as NumUsesInMedicalHx


 FROM medicaL_HX m


LEFT OUTER JOIN CLARITY_EDG dx
ON m.DX_ID =dx.dx_id where dx.DX_NAME like ‘%colon%’ 



and dx.DX_NAME not like ‘%colonization%’

group by dx.CURRENT_ICD9_LIST


order by dx.CURRENT_ICD9_LIST


Paper Summary

Read this paper in prep for our manuscript.  Seems like we’re on a similar path.

·    Australia is like an entire SEER site (but it’s not SEER) where you have to report certain cancers to the government.
·    Even though pathology labs do electronic reporting, the current reporting process to the government is done on paper.  This is dumb, so can we automate this?
·    The first step (corpus selction) is to identify notifiable reports.  These are all cytology and histology reports excluding urine, sputum, and pap smear.  This is done with a query on, I think, the HL7 data.
·    The second step is to identify the histology type to see if it is a cancer notifiable result.  This step itself has two steps.
·    Step 2a (NER) is to go over all the SNOMED CT concepts and reason over them (so rule based) to see if they are descendents of one of the notifiable concepts, of which there are six.  They pick the most advanced of the concepts from the report as the concept for the report.
·    Step 2b (status annotation) is to mark each of the concepts that fit the notifiable criteria as absent, possible, or present.
·    If any notifiable histologies are present and aren’t BCC or SCC of skin then the result is notifiable.
·    Then there’s some discussion about supporting reports which makes no sense.
·    “The ground truth was created based on an adjudication process between the reference data set provided by a domain expert and the output of the system for all reports in the development and evaluation set.”
·    Somehow their corpora ended up having roughly equal number notifiable and non-notifiable reports, which seems crazy to me.
·    They report sensitivity, PPV, specificity and F-score.
·    They have 30 misclassifications over both training and test and report an ”error rate” based on this.
·    They have 7 false negatives and 23 false positives, and report the false negatives are more costly.
·    The false negatives are mostly due to sectioning and status annotation errors.

Aug 15

Mystery Meds

Here are some notes on my findings looking into mystery medications.  Remember, these are medications that do not start with brand name (BN) or ingredient (IN), a class of things I refer to ass bnin.

There was a suggestion at the last meeting to use what I thought I was told was a ‘synonym’ column for the bnin listing.  It turns out that was more of a normalized form or the bnin was normalized and the column I was to use was raw.  Whatever, using it worked well.

Modifications to code

Results

So that’s good.  On our status call today, it was pointed out to me that:

Next Steps:

Aug 10

IQR Indexes

Got most of the way through figuring out the descriptive stats of our corpora.  Only thing left to do is IQR and median on both test and training for progress and radiology notes combined.

The best thing I found for figuring out what indexes to use for IQR is on page 34 of my stats book.

Aug 09

Schwing

Big win for me today, figuring out on my own how to set up a new set of notes in the Clinical Text Explorer so we could start to get some abstraction done on them.  I had to, among other things:

  1. Get a SAS programmer to get the notes and metadata out of SAS for me.
  2. Dump the notes each into their own file along with the metadata into an Excel file.
  3. Import the metadata and notes to a SQL Server table.
  4. Create a View in the proper form with the proper links to the colonoscopy data.
  5. Hunted down the lookup table the ADE app uses to populate its Text Source menu.  (This was my biggest victory since the two folks who know where this is are both effectively as reachable as one can be in the BWCA and I just pounded away at it.)

One thing I want to note is that I’m using the field Note_Dont_Use.  I named it that because I was worried there might be truncated notes in there, but I verified there weren’t.  So it’s cool.

Alcohol

Interesting meeting today on the Alcohol grant we’ll be submitting.  The takeaway for me, though, is that for the first part we’ll be trying to identify, using machine learning (or otherwise, I suppose) those patients who have evidence of a drinking problem.  E.g., ER admits while intoxicated or seen in ER for traumas we know are alcohol related.  E.g., pancreatitis (I think).

Might actually fit into the delivery system since there’s a push to try to identify those patients who are undercoded…charted but not coded…and whose upcoding would lead to significantly increased revenue.

Aug 08

Other stuff I did today

Selecting Columns in Notepad++

Selecting tall rectangles

If you wish to select a very long column block that extends over many pages (for example, in a very long file), this might be the best technique:

Source.

Smartsets

Spent some time today looking for smart set data in Clarity.  I had a patient ID and a snippet of text, but wasn’t able to find it.  Though I did find onenotewith similar information in it.

The closets I was able to find was PAT_ENC_SMARTSET, which links a patient encounter to a smart set used in that encounter.  Howver, SMARTSET_ID doesn’t seem to link to anything in Clarity.  And anyway, it seems it would link to the smart set template, as it were, and not the encounter’s version of that smart set, which is what we’d want and should be on PAT_ENC_SMARTSET.