Back from the depths, where I presented and delivered two presentations over the course of two weeks and the kind of immersion I require when I do that….
Got an e-mail today with a good query for getting personal history out of Clarity. A colleague put this together.
Look at use in medical hx of ICD9 codes with ‘colon’ in any of the diganosis names for that code This includes a record for each time a person’s medical hx is updated
This is a rough proxy until a detailed code list is created
, max(dx.DX_NAME) —dedupe by choosing arbitrary dx_name for the icd code , count(*) as NumUsesInMedicalHx
FROM medicaL_HX m
LEFT OUTER JOIN CLARITY_EDG dx
ON m.DX_ID =dx.dx_id where dx.DX_NAME like ‘%colon%’
and dx.DX_NAME not like ‘%colonization%’
group by dx.CURRENT_ICD9_LIST
order by dx.CURRENT_ICD9_LIST
Read this paper in prep for our manuscript. Seems like we’re on a similar path.
· Australia is like an entire SEER site (but it’s not SEER) where you have to report certain cancers to the government.
· Even though pathology labs do electronic reporting, the current reporting process to the government is done on paper. This is dumb, so can we automate this?
· The first step (corpus selction) is to identify notifiable reports. These are all cytology and histology reports excluding urine, sputum, and pap smear. This is done with a query on, I think, the HL7 data.
· The second step is to identify the histology type to see if it is a cancer notifiable result. This step itself has two steps.
· Step 2a (NER) is to go over all the SNOMED CT concepts and reason over them (so rule based) to see if they are descendents of one of the notifiable concepts, of which there are six. They pick the most advanced of the concepts from the report as the concept for the report.
· Step 2b (status annotation) is to mark each of the concepts that fit the notifiable criteria as absent, possible, or present.
· If any notifiable histologies are present and aren’t BCC or SCC of skin then the result is notifiable.
· Then there’s some discussion about supporting reports which makes no sense.
· “The ground truth was created based on an adjudication process between the reference data set provided by a domain expert and the output of the system for all reports in the development and evaluation set.”
· Somehow their corpora ended up having roughly equal number notifiable and non-notifiable reports, which seems crazy to me.
· They report sensitivity, PPV, specificity and F-score.
· They have 30 misclassifications over both training and test and report an ”error rate” based on this.
· They have 7 false negatives and 23 false positives, and report the false negatives are more costly.
· The false negatives are mostly due to sectioning and status annotation errors.
Here are some notes on my findings looking into mystery medications. Remember, these are medications that do not start with brand name (BN) or ingredient (IN), a class of things I refer to ass bnin.
There was a suggestion at the last meeting to use what I thought I was told was a ‘synonym’ column for the bnin listing. It turns out that was more of a normalized form or the bnin was normalized and the column I was to use was raw. Whatever, using it worked well.
Modifications to code
So that’s good. On our status call today, it was pointed out to me that:
Got most of the way through figuring out the descriptive stats of our corpora. Only thing left to do is IQR and median on both test and training for progress and radiology notes combined.
The best thing I found for figuring out what indexes to use for IQR is on page 34 of my stats book.
Big win for me today, figuring out on my own how to set up a new set of notes in the Clinical Text Explorer so we could start to get some abstraction done on them. I had to, among other things:
One thing I want to note is that I’m using the field Note_Dont_Use. I named it that because I was worried there might be truncated notes in there, but I verified there weren’t. So it’s cool.
Interesting meeting today on the Alcohol grant we’ll be submitting. The takeaway for me, though, is that for the first part we’ll be trying to identify, using machine learning (or otherwise, I suppose) those patients who have evidence of a drinking problem. E.g., ER admits while intoxicated or seen in ER for traumas we know are alcohol related. E.g., pancreatitis (I think).
Might actually fit into the delivery system since there’s a push to try to identify those patients who are undercoded…charted but not coded…and whose upcoding would lead to significantly increased revenue.
Selecting tall rectangles
If you wish to select a very long column block that extends over many pages (for example, in a very long file), this might be the best technique:
- Click to position the cursor at the top left corner of the desired block.
- Scroll down to the desired end of the block by any means that does not change the cursor position (drag the vertical scroll bar, use the wheel of your mouse).
- Hold down alt+shift and click on the bottom right corner of the desired block.
Spent some time today looking for smart set data in Clarity. I had a patient ID and a snippet of text, but wasn’t able to find it. Though I did find onenotewith similar information in it.
The closets I was able to find was PAT_ENC_SMARTSET, which links a patient encounter to a smart set used in that encounter. Howver, SMARTSET_ID doesn’t seem to link to anything in Clarity. And anyway, it seems it would link to the smart set template, as it were, and not the encounter’s version of that smart set, which is what we’d want and should be on PAT_ENC_SMARTSET.