Back from the depths, where I presented and delivered two presentations over the course of two weeks and the kind of immersion I require when I do that….
- Figured out, again, how to log on to the Mayo cloud. Uploaded our stratified corpus.
- Got Java Web Start stuff to work by modifying the Java Control Panel.
- Drug NER meeting. Had an a-ha moment where I realized I could search for a string from my mystery drugs and potentially link it to a known BN/IN using RxCUI. See ‘erirthrowhatever’.
- Found the above example has two RxCuis for the exact same orthography. This was mostly a head-scratcher.
- Got 100 more cspy and 100 more path notes.
- Reviewed cognitive and functional status grant.
Clarity Personal Hx Query
Got an e-mail today with a good query for getting personal history out of Clarity. A colleague put this together.
Look at use in medical hx of ICD9 codes with ‘colon’ in any of the diganosis names for that code This includes a record for each time a person’s medical hx is updated
This is a rough proxy until a detailed code list is created
, max(dx.DX_NAME) —dedupe by choosing arbitrary dx_name for the icd code , count(*) as NumUsesInMedicalHx
FROM medicaL_HX m
LEFT OUTER JOIN CLARITY_EDG dx
ON m.DX_ID =dx.dx_id where dx.DX_NAME like ‘%colon%’
and dx.DX_NAME not like ‘%colonization%’
group by dx.CURRENT_ICD9_LIST
order by dx.CURRENT_ICD9_LIST
Read this paper in prep for our manuscript. Seems like we’re on a similar path.
· Australia is like an entire SEER site (but it’s not SEER) where you have to report certain cancers to the government.
· Even though pathology labs do electronic reporting, the current reporting process to the government is done on paper. This is dumb, so can we automate this?
· The first step (corpus selction) is to identify notifiable reports. These are all cytology and histology reports excluding urine, sputum, and pap smear. This is done with a query on, I think, the HL7 data.
· The second step is to identify the histology type to see if it is a cancer notifiable result. This step itself has two steps.
· Step 2a (NER) is to go over all the SNOMED CT concepts and reason over them (so rule based) to see if they are descendents of one of the notifiable concepts, of which there are six. They pick the most advanced of the concepts from the report as the concept for the report.
· Step 2b (status annotation) is to mark each of the concepts that fit the notifiable criteria as absent, possible, or present.
· If any notifiable histologies are present and aren’t BCC or SCC of skin then the result is notifiable.
· Then there’s some discussion about supporting reports which makes no sense.
· “The ground truth was created based on an adjudication process between the reference data set provided by a domain expert and the output of the system for all reports in the development and evaluation set.”
· Somehow their corpora ended up having roughly equal number notifiable and non-notifiable reports, which seems crazy to me.
· They report sensitivity, PPV, specificity and F-score.
· They have 30 misclassifications over both training and test and report an ”error rate” based on this.
· They have 7 false negatives and 23 false positives, and report the false negatives are more costly.
· The false negatives are mostly due to sectioning and status annotation errors.
Here are some notes on my findings looking into mystery medications. Remember, these are medications that do not start with brand name (BN) or ingredient (IN), a class of things I refer to ass bnin.
There was a suggestion at the last meeting to use what I thought I was told was a ‘synonym’ column for the bnin listing. It turns out that was more of a normalized form or the bnin was normalized and the column I was to use was raw. Whatever, using it worked well.
Modifications to code
- Any mystery med that started with a synonym was removed from the list of mystery meds because if we treat syns like bnins in a new class called bninsyns, then these meds all start with bninsyns.
- All remaining mystery meds’ n-grams were compared with bninsyns to find those mystery meds with a bninsyn somewhere in them, even if it wasn’t initial. These are referred to as solved mystery meds.
- Bug fix: When calculating the prefix of a solved mystery med, ensure that the bninsyn is full tokens, not partial tokens
- Bug fix: Can’t remember what this was.
- The number of remaining mystery meds (those without a bninsyn in them) went from 6895 four weeks ago to 3642.
- The number of unique prefixes (the part of a mystery med before the first (and longest in the case of a tie) bninsyn) went from 5796 to 869.
So that’s good. On our status call today, it was pointed out to me that:
- There are many meds in the remaining mystery meds that do have BNs in them. But I’m not picking them up because they’re surrounded by s and I don’t tokenize, so the created n-grams don’t match the BN when I compare.
- The RxNav tool, which I’ve been having trouble running inside our firewall, is good, and can give RxNorm synonyms.
- Fix problems with brand names hidden due to brackets.
- See how well RxNav can get me the rest of the way on remaining mystery meds and even prefixes. E.g., asa -> aspirin, k+ -> potassium.
Got most of the way through figuring out the descriptive stats of our corpora. Only thing left to do is IQR and median on both test and training for progress and radiology notes combined.
The best thing I found for figuring out what indexes to use for IQR is on page 34 of my stats book.
Big win for me today, figuring out on my own how to set up a new set of notes in the Clinical Text Explorer so we could start to get some abstraction done on them. I had to, among other things:
- Get a SAS programmer to get the notes and metadata out of SAS for me.
- Dump the notes each into their own file along with the metadata into an Excel file.
- Import the metadata and notes to a SQL Server table.
- Create a View in the proper form with the proper links to the colonoscopy data.
- Hunted down the lookup table the ADE app uses to populate its Text Source menu. (This was my biggest victory since the two folks who know where this is are both effectively as reachable as one can be in the BWCA and I just pounded away at it.)
One thing I want to note is that I’m using the field Note_Dont_Use. I named it that because I was worried there might be truncated notes in there, but I verified there weren’t. So it’s cool.
Interesting meeting today on the Alcohol grant we’ll be submitting. The takeaway for me, though, is that for the first part we’ll be trying to identify, using machine learning (or otherwise, I suppose) those patients who have evidence of a drinking problem. E.g., ER admits while intoxicated or seen in ER for traumas we know are alcohol related. E.g., pancreatitis (I think).
Might actually fit into the delivery system since there’s a push to try to identify those patients who are undercoded…charted but not coded…and whose upcoding would lead to significantly increased revenue.
Other stuff I did today
- Got more descriptive stats for the epi journal paper. Going ridonkulously slow, but today I got number of path reports for training and test and all cohorts as well as the number of patients in each cohort with at least one report.
- What made this go slow was realizing the directory of files wasn’t filtered by the same filter we use on our cohort (primary >= 1/1/95).
- Created text versions of the colonoscopy reports for the Panther project. Issue: Have no idea where the path reports are.
Selecting Columns in Notepad++
Selecting tall rectangles
If you wish to select a very long column block that extends over many pages (for example, in a very long file), this might be the best technique:
- Click to position the cursor at the top left corner of the desired block.
- Scroll down to the desired end of the block by any means that does not change the cursor position (drag the vertical scroll bar, use the wheel of your mouse).
- Hold down alt+shift and click on the bottom right corner of the desired block.
Spent some time today looking for smart set data in Clarity. I had a patient ID and a snippet of text, but wasn’t able to find it. Though I did find onenotewith similar information in it.
The closets I was able to find was PAT_ENC_SMARTSET, which links a patient encounter to a smart set used in that encounter. Howver, SMARTSET_ID doesn’t seem to link to anything in Clarity. And anyway, it seems it would link to the smart set template, as it were, and not the encounter’s version of that smart set, which is what we’d want and should be on PAT_ENC_SMARTSET.