Here are some notes on my findings looking into mystery medications. Remember, these are medications that do not start with brand name (BN) or ingredient (IN), a class of things I refer to ass bnin.
There was a suggestion at the last meeting to use what I thought I was told was a ‘synonym’ column for the bnin listing. It turns out that was more of a normalized form or the bnin was normalized and the column I was to use was raw. Whatever, using it worked well.
Modifications to code
- Any mystery med that started with a synonym was removed from the list of mystery meds because if we treat syns like bnins in a new class called bninsyns, then these meds all start with bninsyns.
- All remaining mystery meds’ n-grams were compared with bninsyns to find those mystery meds with a bninsyn somewhere in them, even if it wasn’t initial. These are referred to as solved mystery meds.
- Bug fix: When calculating the prefix of a solved mystery med, ensure that the bninsyn is full tokens, not partial tokens
- Bug fix: Can’t remember what this was.
- The number of remaining mystery meds (those without a bninsyn in them) went from 6895 four weeks ago to 3642.
- The number of unique prefixes (the part of a mystery med before the first (and longest in the case of a tie) bninsyn) went from 5796 to 869.
So that’s good. On our status call today, it was pointed out to me that:
- There are many meds in the remaining mystery meds that do have BNs in them. But I’m not picking them up because they’re surrounded by s and I don’t tokenize, so the created n-grams don’t match the BN when I compare.
- The RxNav tool, which I’ve been having trouble running inside our firewall, is good, and can give RxNorm synonyms.
- Fix problems with brand names hidden due to brackets.
- See how well RxNav can get me the rest of the way on remaining mystery meds and even prefixes. E.g., asa -> aspirin, k+ -> potassium.
Got most of the way through figuring out the descriptive stats of our corpora. Only thing left to do is IQR and median on both test and training for progress and radiology notes combined.
The best thing I found for figuring out what indexes to use for IQR is on page 34 of my stats book.
Quick and dirty post before the day ends….
- Got some wins with figuring out the FAMILY_HX table as much as I did. I think peeps here are gonna like that. Had a good convo with boss about value of getting GHRI knowledge of data model.
- Got some more descriptive stats for the epidemiology paper.
- Had maje problems pushing with github. Was ‘cuz my work pwd changed and I had to go change it in my .gitconfig file (in a protected dir natch).
- Emphasized how important it is for us to get files in front of abstractors to start abstracting notes to give us a gold standard.
My time got divided to three (kinda two) main things today.
The first (and least) was attending some conference calls for SHARP. I was just kind of peripherally involved, though I was interested in the work that somebody from MIT is doing in terms of the status annotator.
Second, I spent some time reviewing all the edits in our Brcarec manuscript. There’s quite a bit of work, timewise, that’s left to go in to addressing those, but little to none of it is foundational, so it should be do-able. Boss says we’re kind of taking a risk submitting to an epi journal anyway, so we might as well submit sooner rather than later since it’s a crapshoot.
Third, I spent most of my time trying to structure the work for the intern we have coming in tomorrow. I’ll be “managing” his day-to-day work, so I wanted to have things as clearly laid out as possible so I wasn’t completely making it up on the fly. This also involved another good phone call with the RS on the project where we clarified next steps. I continue to describe the way the pipeline works and where I think we should want to go and we continue to move in that way.
Oh, and big news yesterday in that Boss’s deidentification grant got funded.