<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>Where I write about what I did so I don’t forget.</description><title>My BioNLP Day</title><generator>Tumblr (3.0; @bionlp)</generator><link>http://bionlp.tumblr.com/</link><item><title>Status</title><description>&lt;p&gt;Back from the depths, where I presented and delivered two presentations over the course of two weeks and the kind of immersion I require when I do that&amp;#8230;.&lt;/p&gt;
&lt;p&gt;SHARP:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Figured out, again, how to log on to the Mayo cloud.  Uploaded our stratified corpus.&lt;/li&gt;
&lt;li&gt;Got Java Web Start stuff to work by modifying the Java Control Panel.&lt;/li&gt;
&lt;li&gt;Drug NER meeting.  Had an a-ha moment where I realized I could search for a string from my mystery drugs and potentially link it to a known BN/IN using RxCUI.  See &amp;#8216;erirthrowhatever&amp;#8217;.&lt;/li&gt;
&lt;li&gt;Found the above example has two RxCuis for the exact same orthography.  This was mostly a head-scratcher.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;SuCCESS:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Got 100 more cspy and 100 more path notes.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Other:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Reviewed cognitive and functional status grant.&lt;/li&gt;
&lt;/ul&gt;</description><link>http://bionlp.tumblr.com/post/32363151804</link><guid>http://bionlp.tumblr.com/post/32363151804</guid><pubDate>Wed, 26 Sep 2012 17:28:54 -0700</pubDate><category>Java Web Start</category><category>SHARP</category><category>JWS</category><category>Drug NER</category><category>jnlp</category><category>Rxcui</category><category>SuCCESS</category><category>grants</category></item><item><title>Clarity Personal Hx Query</title><description>&lt;p&gt;Got an e-mail today with a good query for getting personal history out of Clarity.  A colleague put this together.&lt;/p&gt;


&lt;p&gt;Look at use in medical hx of ICD9 codes with &amp;#8216;colon&amp;#8217; in any of the diganosis names for that code   This includes a record for each time a person&amp;#8217;s medical hx is updated &lt;br/&gt;   This is a rough proxy until a detailed code list is created &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;   SELECT&lt;span&gt;  &lt;/span&gt;dx.CURRENT_ICD9_LIST&lt;span&gt; &lt;/span&gt; &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;span&gt;     &lt;/span&gt;, max(dx.DX_NAME)&lt;span&gt;  &lt;/span&gt;&amp;#8212;dedupe by choosing arbitrary dx_name for the icd code   &lt;span&gt;      &lt;/span&gt;, count(*) as NumUsesInMedicalHx &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;span&gt; &lt;/span&gt;FROM medicaL_HX m &lt;br/&gt;&lt;br/&gt;&lt;br/&gt; LEFT OUTER JOIN CLARITY_EDG dx &lt;br/&gt; ON m.DX_ID =dx.dx_id   where dx.DX_NAME like &amp;#8216;%colon%&amp;#8217;&lt;span&gt; &lt;/span&gt; &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt; and dx.DX_NAME not like &amp;#8216;%colonization%&amp;#8217; &lt;br/&gt;&lt;br/&gt;   group by dx.CURRENT_ICD9_LIST &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;   order by dx.CURRENT_ICD9_LIST &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/30529822452</link><guid>http://bionlp.tumblr.com/post/30529822452</guid><pubDate>Thu, 30 Aug 2012 10:57:54 -0700</pubDate></item><item><title>Paper Summary</title><description>&lt;p&gt;Read &lt;a href="http://books.google.com/books?id=luRjxoo3gLgC&amp;amp;lpg=PA150&amp;amp;ots=DbXRDeCiZh&amp;amp;dq=%22Classification%20of%20pathology%20reports%20for%20cancer%20registry%20notifications%22&amp;amp;lr&amp;amp;pg=PA150#v=onepage&amp;amp;q=%22Classification%20of%20pathology%20reports%20for%20cancer%20registry%20notifications%22&amp;amp;f=false"&gt;this paper&lt;/a&gt; in prep for our manuscript.  Seems like we&amp;#8217;re on a similar path.&lt;/p&gt;

&lt;p&gt;·    Australia is like an entire SEER site (but it’s not SEER) where you have to report certain cancers to the government.&lt;br/&gt;·    Even though pathology labs do electronic reporting, the current reporting process to the government is done on paper.  This is dumb, so can we automate this?&lt;br/&gt;·    The first step (corpus selction) is to identify notifiable reports.  These are all cytology and histology reports excluding urine, sputum, and pap smear.  This is done with a query on, I think, the HL7 data.&lt;br/&gt;·    The second step is to identify the histology type to see if it is a cancer notifiable result.  This step itself has two steps.&lt;br/&gt;·    Step 2a (NER) is to go over all the SNOMED CT concepts and reason over them (so rule based) to see if they are descendents of one of the notifiable concepts, of which there are six.  They pick the most advanced of the concepts from the report as the concept for the report.&lt;br/&gt;·    Step 2b (status annotation) is to mark each of the concepts that fit the notifiable criteria as absent, possible, or present.&lt;br/&gt;·    If any notifiable histologies are present and aren’t BCC or SCC of skin then the result is notifiable.&lt;br/&gt;·    Then there’s some discussion about supporting reports which makes no sense.&lt;br/&gt;·    “The ground truth was created based on an adjudication process between the reference data set provided by a domain expert and the output of the system for all reports in the development and evaluation set.”&lt;br/&gt;·    Somehow their corpora ended up having roughly equal number notifiable and non-notifiable reports, which seems crazy to me.&lt;br/&gt;·    They report sensitivity, PPV, specificity and F-score.&lt;br/&gt;·    They have 30 misclassifications over both training and test and report an ”error rate” based on this.&lt;br/&gt;·    They have 7 false negatives and 23 false positives, and report the false negatives are more costly.&lt;br/&gt;·    The false negatives are mostly due to sectioning and status annotation errors.&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/30529734610</link><guid>http://bionlp.tumblr.com/post/30529734610</guid><pubDate>Thu, 30 Aug 2012 10:56:01 -0700</pubDate><category>brcarec</category><category>paper summary</category></item><item><title>Mystery Meds</title><description>&lt;p&gt;Here are some notes on my findings looking into mystery medications.  Remember, these are medications that do not start with brand name (BN) or ingredient (IN), a class of things I refer to ass bnin.&lt;/p&gt;
&lt;p&gt;There was a suggestion at the last meeting to use what I thought I was told was a &amp;#8216;synonym&amp;#8217; column for the bnin listing.  It turns out that was more of a normalized form or the bnin was normalized and the column I was to use was raw.  Whatever, using it worked well.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Modifications to code&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Any mystery med that started with a synonym was removed from the list of mystery meds because if we treat syns like bnins in a new class called bninsyns, then these meds all start with bninsyns.&lt;/li&gt;
&lt;li&gt;All remaining mystery meds&amp;#8217; n-grams were compared with bninsyns to find those mystery meds with a bninsyn somewhere in them, even if it wasn&amp;#8217;t initial.  These are referred to as solved mystery meds.&lt;/li&gt;
&lt;li&gt;Bug fix: When calculating the prefix of a solved mystery med, ensure that the bninsyn is full tokens, not partial tokens&lt;/li&gt;
&lt;li&gt;Bug fix: Can&amp;#8217;t remember what this was.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Results&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;The number of remaining mystery meds (those without a bninsyn in them) went from 6895 four weeks ago to 3642.&lt;/li&gt;
&lt;li&gt;The number of unique prefixes (the part of a mystery med before the first (and longest in the case of a tie) bninsyn) went from 5796 to 869.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;So that&amp;#8217;s good.  On our status call today, it was pointed out to me that:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;There are many meds in the remaining mystery meds that do have BNs in them.  But I&amp;#8217;m not picking them up because they&amp;#8217;re surrounded by []s and I don&amp;#8217;t tokenize, so the created n-grams don&amp;#8217;t match the BN when I compare.&lt;/li&gt;
&lt;li&gt;The RxNav tool, which I&amp;#8217;ve been having trouble running inside our firewall, is good, and can give RxNorm synonyms.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Next Steps:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Fix problems with brand names hidden due to brackets.&lt;/li&gt;
&lt;li&gt;See how well RxNav can get me the rest of the way on remaining mystery meds and even prefixes.  E.g., asa -&amp;gt; aspirin, k+ -&amp;gt; potassium.&lt;/li&gt;
&lt;/ul&gt;</description><link>http://bionlp.tumblr.com/post/29501325161</link><guid>http://bionlp.tumblr.com/post/29501325161</guid><pubDate>Wed, 15 Aug 2012 13:48:28 -0700</pubDate><category>sharp</category><category>mystery meds</category><category>status</category><category>drug ner</category></item><item><title>IQR Indexes</title><description>&lt;p&gt;Got most of the way through figuring out the descriptive stats of our corpora.  Only thing left to do is IQR and median on both test and training for progress and radiology notes combined.&lt;/p&gt;
&lt;p&gt;The best thing I found for figuring out what indexes to use for IQR is on page 34 of my stats book.&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/29159740062</link><guid>http://bionlp.tumblr.com/post/29159740062</guid><pubDate>Fri, 10 Aug 2012 17:44:25 -0700</pubDate><category>stats</category><category>Brcarec</category><category>status</category></item><item><title>Schwing</title><description>&lt;p&gt;Big win for me today, figuring out on my own how to set up a new set of notes in the Clinical Text Explorer so we could start to get some abstraction done on them.  I had to, among other things:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Get a SAS programmer to get the notes and metadata out of SAS for me.&lt;/li&gt;
&lt;li&gt;Dump the notes each into their own file along with the metadata into an Excel file.&lt;/li&gt;
&lt;li&gt;Import the metadata and notes to a SQL Server table.&lt;/li&gt;
&lt;li&gt;Create a View in the proper form with the proper links to the colonoscopy data.&lt;/li&gt;
&lt;li&gt;Hunted down the lookup table the ADE app uses to populate its Text Source menu.  (This was my biggest victory since the two folks who know where this is are both effectively as reachable as one can be in the BWCA and I just pounded away at it.)&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;One thing I want to note is that I&amp;#8217;m using the field Note_Dont_Use.  I named it that because I was worried there might be truncated notes in there, but I verified there weren&amp;#8217;t.  So it&amp;#8217;s cool.&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/29084250046</link><guid>http://bionlp.tumblr.com/post/29084250046</guid><pubDate>Thu, 09 Aug 2012 16:26:50 -0700</pubDate></item><item><title>Alcohol</title><description>&lt;p&gt;Interesting meeting today on the Alcohol grant we&amp;#8217;ll be submitting.  The takeaway for me, though, is that for the first part we&amp;#8217;ll be trying to identify, using machine learning (or otherwise, I suppose) those patients who have evidence of a drinking problem.  E.g., ER admits while intoxicated or seen in ER for traumas we know are alcohol related.  E.g., pancreatitis (I think).&lt;/p&gt;
&lt;p&gt;Might actually fit into the delivery system since there&amp;#8217;s a push to try to identify those patients who are undercoded&amp;#8230;charted but not coded&amp;#8230;and whose upcoding would lead to significantly increased revenue.&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/29084005639</link><guid>http://bionlp.tumblr.com/post/29084005639</guid><pubDate>Thu, 09 Aug 2012 16:23:19 -0700</pubDate></item><item><title>Other stuff I did today</title><description>&lt;p&gt;&lt;ul&gt;&lt;li&gt;Got more descriptive stats for the epi journal paper.  Going ridonkulously slow, but today I got number of path reports for training and test and all cohorts as well as the number of patients in each cohort with at least one report.&lt;/li&gt;
&lt;li&gt;What made this go slow was realizing the directory of files wasn&amp;#8217;t filtered by the same filter we use on our cohort (primary &amp;gt;= 1/1/95).&lt;/li&gt;
&lt;li&gt;Created text versions of the colonoscopy reports for the Panther project.  Issue: Have no idea where the path reports are.&lt;/li&gt;
&lt;/ul&gt;&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/29017096650</link><guid>http://bionlp.tumblr.com/post/29017096650</guid><pubDate>Wed, 08 Aug 2012 17:37:30 -0700</pubDate></item><item><title>Selecting Columns in Notepad++</title><description>&lt;blockquote&gt;
&lt;h3&gt;&lt;span class="mw-headline"&gt;Selecting tall rectangles&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;If you wish to select a very long column block that extends over many pages (for example, in a very long file), this might be the best technique:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Click to position the cursor at the top left corner of the desired block.&lt;/li&gt;
&lt;li&gt;Scroll down to the desired end of the block by any means that does not change the cursor position (drag the vertical scroll bar, use the wheel of your mouse).&lt;/li&gt;
&lt;li&gt;Hold down alt+shift and click on the bottom right corner of the desired block.&lt;/li&gt;
&lt;/ul&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Column_Editing"&gt;Source&lt;/a&gt;.&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/29005612590</link><guid>http://bionlp.tumblr.com/post/29005612590</guid><pubDate>Wed, 08 Aug 2012 14:47:37 -0700</pubDate></item><item><title>Smartsets</title><description>&lt;p&gt;Spent some time today looking for smart set data in Clarity.  I had a patient ID and a snippet of text, but wasn&amp;#8217;t able to find it.  Though I did find onenotewith similar information in it.&lt;/p&gt;
&lt;p&gt;The closets I was able to find was PAT_ENC_SMARTSET, which links a patient encounter to a smart set used in that encounter.  Howver, SMARTSET_ID doesn&amp;#8217;t seem to link to anything in Clarity.  And anyway, it seems it would link to the smart set template, as it were, and not the encounter&amp;#8217;s version of that smart set, which is what we&amp;#8217;d want and should be on PAT_ENC_SMARTSET.&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/29002794775</link><guid>http://bionlp.tumblr.com/post/29002794775</guid><pubDate>Wed, 08 Aug 2012 14:05:40 -0700</pubDate></item><item><title>Care Everywhere</title><description>&lt;p&gt;Had a present from HQ today on our implementation of Epic&amp;#8217;s Care Everywhere and how it does and doesn&amp;#8217;t work for research.&lt;/p&gt;
&lt;p&gt;Two big takeaways:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;We get records when they&amp;#8217;re requested.  So care providers can do it.  We don&amp;#8217;t really have a way to go get them.  But when they&amp;#8217;re gotten they become a part of the clinical record.&lt;/li&gt;
&lt;li&gt;The outside records are stored on a different server.  An InterConnect server. (I&amp;#8217;m guessing on the capitalization there.)  She also made it sound like then that these records wouldn&amp;#8217;t be part of a Clarity extract, but this seems weird to me, that Epic would be pulling from two Chronicles databases during a clinical visit.&lt;/li&gt;
&lt;/ol&gt;</description><link>http://bionlp.tumblr.com/post/28992038672</link><guid>http://bionlp.tumblr.com/post/28992038672</guid><pubDate>Wed, 08 Aug 2012 11:09:19 -0700</pubDate></item><item><title>Status</title><description>&lt;p&gt;Quick and dirty post before the day ends&amp;#8230;.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Got some wins with figuring out the FAMILY_HX table as much as I did.  I think peeps here are gonna like that.  Had a good convo with boss about value of getting GHRI knowledge of data model.&lt;/li&gt;
&lt;li&gt;Got some more descriptive stats for the epidemiology paper.&lt;/li&gt;
&lt;li&gt;Had maje problems pushing with github.  Was &amp;#8216;cuz my work pwd changed and I had to go change it in my .gitconfig file (in a protected dir natch).&lt;/li&gt;
&lt;li&gt;Emphasized how important it is for us to get files in front of abstractors to start abstracting notes to give us a gold standard.&lt;/li&gt;
&lt;/ul&gt;</description><link>http://bionlp.tumblr.com/post/28945528611</link><guid>http://bionlp.tumblr.com/post/28945528611</guid><pubDate>Tue, 07 Aug 2012 17:43:33 -0700</pubDate><category>status</category><category>daily</category><category>success</category><category>brcarec</category><category>git</category></item><item><title>FX</title><description>&lt;p&gt;Some valuable queries on the structured family history data available in Clarity.&lt;/p&gt;
&lt;blockquote&gt;select COUNT(*) N, zmh.Name, zmh.MEDICAL_HX_C code, fh.RELATION&lt;br/&gt;from FAMILY_HX fh&lt;br/&gt;    inner join ZC_MEDICAL_HX zmh&lt;br/&gt;        on fh.MEDICAL_HX_C = zmh.MEDICAL_HX_C&lt;br/&gt;where zmh.MEDICAL_HX_C = 20 or zmh.MEDICAL_HX_C = 30&lt;br/&gt;group by zmh.NAME, zmh.MEDICAL_HX_C, fh.RELATION&lt;br/&gt;order by N desc&lt;/blockquote&gt;
&lt;blockquote&gt;select zmh.Name, zmh.MEDICAL_HX_C code, fh.RELATION, fh.PAT_ID, fh.PAT_ENC_CSN_ID,&lt;br/&gt;            fh.PAT_ENC_DATE_REAL, fh.HX_LNK_ENC_CSN, pe.pat_enc_date_real, fh.CONTACT_DATE, fh.LINE&lt;br/&gt;from FAMILY_HX fh&lt;br/&gt;    inner join ZC_MEDICAL_HX zmh&lt;br/&gt;        on fh.MEDICAL_HX_C = zmh.MEDICAL_HX_C&lt;br/&gt;    left outer join pat_enc pe on fh.hx_lnk_enc_csn = pe.pat_enc_csn_id&lt;br/&gt;where zmh.MEDICAL_HX_C = 20 or zmh.MEDICAL_HX_C = 30&lt;br/&gt;order by PAT_ID, fh.PAT_ENC_DATE_REAL asc, fh.LINE&lt;/blockquote&gt;
&lt;blockquote&gt;select fh.PAT_ENC_CSN_ID, pe1.pat_enc_date_real, fh.HX_LNK_ENC_CSN, pe2.PAT_ENC_DATE_REAL&lt;br/&gt;from FAMILY_HX fh&lt;br/&gt;    inner join PAT_ENC pe1 on fh.PAT_ENC_CSN_ID = pe1.PAT_ENC_CSN_ID&lt;br/&gt;    inner join PAT_ENC pe2 on fh.HX_LNK_ENC_CSN = pe2.PAT_ENC_CSN_ID&lt;br/&gt;where fh.PAT_ENC_CSN_ID &amp;lt;&amp;gt; fh.HX_LNK_ENC_CSN&lt;br/&gt;    and pe2.PAT_ENC_DATE_REAL &amp;gt; pe1.pat_enc_date_real&lt;/blockquote&gt;
&lt;p&gt;We have one remaining question, which is what the heck is the HX_LNK_ENC_CSN field? The description in the Clarity data dictionary isn&amp;#8217;t clear.&lt;/p&gt;
&lt;p&gt;UPDATE 8/9/12: A colleague figured out what that HX_LNK_ENC_CSN field is for.  The deal is that when a FX record is created, that&amp;#8217;s given its own encounter (kind of weird) and that encounter is in PAT_ENC_CSN_ID.  If that FX is created in the context of a real life encounter, than the ID of that encounter goes in HX_LNK_ENC_CSN.  (Otherwise that column is NULL.)&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/28922892416</link><guid>http://bionlp.tumblr.com/post/28922892416</guid><pubDate>Tue, 07 Aug 2012 12:03:00 -0700</pubDate><category>success</category><category>sql</category><category>queries</category><category>family history</category><category>clarity</category></item><item><title>Wildcat and Panther Status Meeting</title><description>&lt;p&gt;Spent a lot of the day catching up on things after a week away at training.  Here&amp;#8217;s a basic rundown of the things to remember from the day, most of which sprung from a status meeting for the Wildcat and Panther projects.&lt;/p&gt;
&lt;p&gt;Panther Notes:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;The work on family history and results is on track per the schedule.&lt;/li&gt;
&lt;li&gt;The work on symptoms is behind schedule.  We had hoped to have results and error analysis done on that by now, but we don&amp;#8217;t have a gold standard yet, so nothing there.  More on that later.&lt;/li&gt;
&lt;li&gt;The documentation of the pipeline is mostly done.  It&amp;#8217;s probably something that will live, though, so we decided I&amp;#8217;d review it now and we could make sure it&amp;#8217;s aimed in the right direction.&lt;/li&gt;
&lt;li&gt;Dev has added some (impressive) rules for family history, but we are suffering a data sparsity problem.&lt;/li&gt;
&lt;li&gt;We assume that any mention of family history is for colorectal cancer.  (Hoping the investigators don&amp;#8217;t care it doesn&amp;#8217;t distinguish polyps, though we have some ideas on how we may capture that, but, again, need more data.)&lt;/li&gt;
&lt;li&gt;(I sent off some information on the family history that&amp;#8217;s available from structured data.)&lt;/li&gt;
&lt;li&gt;We had a win on case-sensitivity using the Java Patterns library, reducing the size of the jape file nicely.&lt;/li&gt;
&lt;li&gt;Some rules from the Pitt pipeline have been combined to make the same output (but still exist as separate rules in case we want to separate later).  E.g., A bunch of stuff -&amp;gt; Lower abdominal symptoms.&lt;/li&gt;
&lt;li&gt;Some rules from the Pitt pipeline have been separated.  E.g., GI Bleeding -&amp;gt; Lower GI Bleeding and Bleeding&lt;/li&gt;
&lt;li&gt;Where we don&amp;#8217;t have an example, the code has been built and is ready for examples to be put into the jape grammar.  E.g., Acute bowel obstruction.&lt;/li&gt;
&lt;li&gt;The biggest issue is with the lack of a gold standard.  I kind of thought we&amp;#8217;d have it by sometime last week when I was gone, but there hasn&amp;#8217;t been good communication/handoff, and so the reports got created on Thursday night and nothing happened with them Friday.  Now we need to get them into a user-friendly database and then have abstractors review the the notes to create a gold standard.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Wildcat status:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Results are still good.&lt;/li&gt;
&lt;li&gt;Some work has been done on feature selection.&lt;/li&gt;
&lt;li&gt;I suggested doing the post-processor before going down into the categories where we only have a few examples.&lt;/li&gt;
&lt;li&gt;The post-processor takes the results of each binary classifier and puts it through a filter, where it keeps at most the five &amp;#8220;most important&amp;#8221; positive classifications, where the &amp;#8220;most important&amp;#8221; list is compiled by humans.&lt;/li&gt;
&lt;li&gt;I pointed out this could be problematic if we let a more important dx with a confidence score of, eg, 51% go through and not a less important dx with a confidence score of, eg, 90%.  So that&amp;#8217;s food for thought.&lt;/li&gt;
&lt;/ul&gt;</description><link>http://bionlp.tumblr.com/post/28874097335</link><guid>http://bionlp.tumblr.com/post/28874097335</guid><pubDate>Mon, 06 Aug 2012 18:01:26 -0700</pubDate></item><item><title>Clone command</title><description>&lt;p&gt;Ugh, because I always forget it:&lt;/p&gt;
&lt;p&gt;&amp;gt; git clone &lt;a href="http://github.com/shalgrim/util.git"&gt;http://github.com/shalgrim/util.git&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Do not use https&amp;#8230;it hangs here at work.&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/28145515064</link><guid>http://bionlp.tumblr.com/post/28145515064</guid><pubDate>Fri, 27 Jul 2012 13:16:44 -0700</pubDate></item><item><title>Other Tasks From Today</title><description>&lt;p&gt;&lt;ul&gt;&lt;li&gt;Cleaned up the figures I&amp;#8217;m thinking of getting into the next round of the BrCaRec epidemiology paper that is coming along too slowly.&lt;/li&gt;
&lt;li&gt;Decided to hold off until the week of 8/6 to work on an article for the internal data-centric publication because maybe I&amp;#8217;ll learn something in training next week that would be good to turn into an article.&lt;/li&gt;
&lt;/ul&gt;&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/28086943067</link><guid>http://bionlp.tumblr.com/post/28086943067</guid><pubDate>Thu, 26 Jul 2012 16:34:47 -0700</pubDate><category>brcarec</category><category>admin</category><category>paper</category><category>writing</category></item><item><title>Windows and Paths</title><description>&lt;p&gt;For the most part I think Windows has been and is a good OS.  But gawd the way it deals with paths, while it&amp;#8217;s getting better, has caused me so much frustration over the years.&lt;/p&gt;
&lt;p&gt;Most of my time today was spent trying to get a script set up to hotcopy our svn repository to a network shared drive.  The network drives are backed up regularly, while our C drives are not, hence the desire for the regular hotcopy.&lt;/p&gt;
&lt;p&gt;Anyway, kept getting a &amp;#8216;path does not exist&amp;#8217; error from my Python script, but only when I ran it as a Scheduled Task.  In troubleshooting, I found that if I ran my script from the command line, it confirmed that the drive letter was a path, but when run from a scheduled task it said it wasn&amp;#8217;t.&lt;/p&gt;
&lt;p&gt;Oddly, os.path.ismount recognized the letter as a drive, and &lt;a href="http://www.computerhope.com/issues/ch000854.htm#2"&gt;this awesome command&lt;/a&gt; listed it as a drive letter both when run from the command line and as a scheduled task.&lt;/p&gt;
&lt;p&gt;Anyway, I finally gave up and just passed the script the UNC path to the script via the config file.  I&amp;#8217;d gotten out of the habit of using UNC paths because cmd doesn&amp;#8217;t allow you to use them as a current directory.  (There&amp;#8217;s &lt;a href="http://tinyapps.org/blog/windows/201201190700_cd_unc_path.html"&gt;kind of a workaround&lt;/a&gt; with pushd.)&lt;/p&gt;
&lt;p&gt;So that seems to have worked and, as always with these kinds of things, I&amp;#8217;m mostly upset I had to spend so much time on it.&lt;/p&gt;
&lt;p&gt;To follow-up, I asked our resident tech stud why I couldn&amp;#8217;t access the network drive from a scheduled task, and he said it was because that drive is mapped by a server-side logon script which isn&amp;#8217;t run for non-interactive logins.  Huynh.&lt;/p&gt;</description><link>http://bionlp.tumblr.com/post/28086624297</link><guid>http://bionlp.tumblr.com/post/28086624297</guid><pubDate>Thu, 26 Jul 2012 16:29:00 -0700</pubDate><category>drives</category><category>paths</category><category>mapping</category><category>map</category><category>drive</category><category>path</category><category>dos</category><category>python</category><category>path does not exist</category></item><item><title>Daily Writeup 7/24/2012</title><description>&lt;p&gt;Two big tasks today.&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Worked on scheduled task for backing up the svn repository.  Just because it had to be done.&lt;/li&gt;
&lt;li&gt;Talked to Panther project dev about project status.  He had a really good documentation of the pipeline.  Very heartening.&lt;/li&gt;
&lt;/ol&gt;</description><link>http://bionlp.tumblr.com/post/27937033683</link><guid>http://bionlp.tumblr.com/post/27937033683</guid><pubDate>Tue, 24 Jul 2012 15:33:37 -0700</pubDate></item><item><title>Daily Writeup</title><description>&lt;p&gt;&lt;span&gt;&lt;strong&gt;STATUS&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Success Project:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Good meeting with the project runners and the abstractors, designing a form to capture all the information they want the NLP algorithm to abstract from the notes.  We&amp;#8217;ll be using this primarily to generate a gold standard&amp;#8230;not so much actually annotating the notes to get training data.&lt;/li&gt;
&lt;li&gt;Exposures -&amp;gt; Indication or Result&lt;/li&gt;
&lt;li&gt;Covariates -&amp;gt; Family History&lt;/li&gt;
&lt;li&gt;We have three levels of FH and the first level (is there any FH) is kind of what the current algorithm is doing?  After that we want to know if an immediate family member has a history of colon cancer, rectal cancer, or colorectal cancer.  The third level is, I think, if one of those peeps has history of polyps.&lt;/li&gt;
&lt;li&gt;To do: Will have a pre-processing task converting Excel documents into machine-readable formats&amp;#8230;i.e., something closer to the output of the pipeline so that we can do comparisons.  (Suggested wxpython, winpy32, and/or python-excel for this task.)&lt;/li&gt;
&lt;li&gt;Decision: Each file will be named CHSID.xls where CHSID is the CHSID.  There will be a tab for each report: CSPY[date] for colonoscopies and PATH[date] for pathology.&lt;/li&gt;
&lt;li&gt;I prettied up the abstraction form that we were editing in the meeting and sent it to the project runner for a kind of final software.&lt;/li&gt;
&lt;li&gt;Had a great meeting where we learned a lot about our processes involving colonoscopies&amp;#8230;how we interact with patients in a surveillance loop, of screening age, etc.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Infrastructure&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Worked on setting up a scheduled task to backup our repo.  Can&amp;#8217;t really get a good thing going with Task Scheduler to notify me if a task fails, so I think I&amp;#8217;m going to write some scripts to write to a log, then follow up on the work I did where I follow that task with one that e-mails me the day&amp;#8217;s log lines and then I&amp;#8217;ll add a daily task to remind me to check for that night&amp;#8217;s tasks in my e-mail.&lt;/li&gt;
&lt;/ul&gt;</description><link>http://bionlp.tumblr.com/post/27872213564</link><guid>http://bionlp.tumblr.com/post/27872213564</guid><pubDate>Mon, 23 Jul 2012 17:38:30 -0700</pubDate></item><item><title>First Priorities</title><description>&lt;p&gt;I should have doco&amp;#8217;d yesterday what we decided we wanted to get first from the success project.&lt;/p&gt;
&lt;p&gt;Those items are family history, symptoms, and results.  But most of the current output can go into results or symptoms.  So here&amp;#8217;s a more specific definition.&lt;/p&gt;
&lt;p&gt;Symptoms: These are pretty much indications as put out by the current pipeline.  I&amp;#8217;ve sent the current set of indications that can occur, the core concepts that trigger them, and the grammar that originally creates those core concepts off for review by our team to see how well they currently fit.  There&amp;#8217;s more to it than just the grammar -&amp;gt; core concept -&amp;gt; indication, but that&amp;#8217;s at least a pretty clear way of most of how we get from text to indication.&lt;/p&gt;
&lt;p&gt;Results will break down as&amp;#8230;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Cancer found? Yes or no.&lt;/li&gt;
&lt;li&gt;Polyp found? Yes or no.&lt;/li&gt;
&lt;li&gt;If polyp found, what is type (adenoma, et al) and size.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;And family history is basically defined by the current pipeline as colon cancer or polyps experienced by person other than patient.  It can currently have one of these three values:&lt;/p&gt;
&lt;p&gt;0 - Negated family history&lt;br/&gt;1 - Family history&lt;br/&gt;2 - No mention of family history&lt;/p&gt;
&lt;p&gt;And here&amp;#8217;s an outline of how it&amp;#8217;s calculated (ref. AnnotationIntegrator.java lines 518-544):&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;If any core concept is (colon cancer or polyp) AND (experiencer is other) AND (directionality is not affirmed), set family history to 0.&lt;/li&gt;
&lt;li&gt;Else, if there are any core concepts that are (colon cancer or polyp) AND (experiencer is other) AND (directionality is affirmed), set family history to 1.&lt;/li&gt;
&lt;li&gt;Else set family history to 2.&lt;/li&gt;
&lt;/ol&gt;</description><link>http://bionlp.tumblr.com/post/27146367599</link><guid>http://bionlp.tumblr.com/post/27146367599</guid><pubDate>Fri, 13 Jul 2012 14:05:13 -0700</pubDate><category>success</category><category>requirements</category></item></channel></rss>
