We saw in the previous post that there is promise to using ICD9 codes for pre-classifying encounters more likely to have our concepts of interest. In this post we’ll walk through building simple logistic regression classifiers based on a training data set, and will evaluate their performance on a test data set.

Overview

As described in the previous post, our goal here is to build a classifier based on anything except free text data to select encounters more likely to have notes containing concepts of interest (e.g. ‘substance abuse’). The reason for this is that we want to build up our dataset and pre-select notes more likely to have our concepts, which are normally of low prevalence in the overall dataset. We will later use these notes to train NLP classifiers to detect the presence of concepts in individual notes based on the text of the note.

In the previous post we found that many of the concepts have patterns of ICD9 codes that are more likely than normal to appear when the concept is present in a patient’s notes. This is promising for us in building a classifier as we know that there is information in the ICD9 codes relating to the class labels. It also tells us that we can build a classifier based on purely linear interactions (e.g. a + b) and likely will not need to include cross terms (e.g. a*b) to get decent performance.

Approach

In this notebook I’ll be using scikit-learn‘s implementation of logistic regression, and specifically their LogisticRegressionCV module, which implements a cross-validation loop to choose hyper-parameters for L2 regularization, which will be used to whittle down our feature set from 181 features to a more reasonable number of only relevant features.

One go-to option I could have used is adaboost. Adaboost can suffer from the problem that it tends to over-weight mislabeled datapoints - each “decision stump” is trained in serial with previous wrongly labeled data points being given extra weight. In this problem I don’t want that as it’s expected that ICD9 codes won’t give good labels that match the notes - this is especially true because the ICD9 codes cover a patient’s entire multi-day ICU stay, which may have 20 notes, some of which have our concepts and others don’t. In those cases, every note will have the ICD9 codes assigned but only some notes may have the concepts.

Another option was random forest, but that also has its problems. In particular, I wanted to be able to look more at feature importance - RF has a method of inspecting that, but it’s not as straight-forward as for logistic regression. That said, RF is great if I have many features that represent continuous measurements, as those can be problematic in LR as you’d have to normalize everything to comparable ranges for the weights to be comparable.

The overall process followed here is:

  1. Randomly assign every note to either test or training set by assigning a random number, then comparing this random number to a threshold (0.3) to create a 70% - 30% training/test data split.

  2. For each category, perform the following

    1. Create a logistic regression classifier using LogisticRegressionCV

    2. Extract the feature weights and print the most important features

    3. Use the classifier to predict labels for our test dataset

    4. Find the 50% sensitivity point, corresponding to the threshold at which a point has a 50-50 chance of being a true positive or a false negative. We’ll use this threshold for labeling our points later.

    5. Evaluate performance by calculating and plotting the ROC curve and confusion matrix.

Conclusion

Overall, very promising results! We can see that we get AUC performances between 0.75 and 0.80, which will definitely improve our selection of notes for annotation above chance, and improve our final data set.

Looking in detail we see that the intuition we got from looking at the ICD9 code odds ratios was confirmed by the logistic regression feature weights. A few examples below:

  • Advanced.Heart.Disease

    • (code, 420-429) OTHER FORMS OF HEART DISEASE
    • (code, 410-414) ISCHEMIC HEART DISEASE
    • (code, 393-398) CHRONIC RHEUMATIC HEART DISEASE
    • (code, 785) Symptoms involving cardiovascular system
  • Advanced.Lung.Disease

    • (code, 510-519) OTHER DISEASES OF RESPIRATORY SYSTEM
    • (code, V46) Other dependence on machines and devices
    • (code, 460-466) ACUTE RESPIRATORY INFECTIONS
    • (code, 490-496) CHRONIC OBSTRUCTIVE PULMONARY DISEASE AND ALLIED CONDITIONS
  • Alcohol.Abuse

    • (code, 570-579) OTHER DISEASES OF DIGESTIVE SYSTEM
    • (code, 290-299) PSYCHOSES
    • (code, V60) Housing, household, and economic circumstances
    • (code, 070-079) OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE
    • (code, V08) Asymptomatic human immunodeficiency virus [HIV] infection status
  • Obesity

    • (code, 270-279) OTHER METABOLIC AND IMMUNITY DISORDERS
    • (code, 700-709) OTHER DISEASES OF SKIN AND SUBCUTANEOUS TISSUE
    • (code, 510-519) OTHER DISEASES OF RESPIRATORY SYSTEM
    • (code, 415-417) DISEASES OF PULMONARY CIRCULATION
    • (code, 327) ORGANIC SLEEP DISORDERS

There are some oddities, e.g. that Lung Disease’s top-weighted code was “OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES” - possibly a medication for these is related to lung disease, or it could be spurious, would require further investigation.

Also note that there is no way to assign causation here. For example looking at Obesity, obesity can lead to various diseases and sleep disorders, but people with sleep disorders and disease can tend to exercise less or have low metabolism that contributes to weight gain. Similarly, alcohol abuse and addiction can lead to personal choices that contribute to poor economic circumstances, or people in poor economic circumstances can be predisposed to become alcohol abusers.

Notebook

Full notebook available here

In [133]:
fit_dat = feat_vecs.dropna().copy()
fit_dat.loc[:, 'random'] = np.random.rand(fit_dat.shape[0], 1)
fit_dat.head()
Out[133]:
category Advanced.Cancer Advanced.Heart.Disease Advanced.Lung.Disease Alcohol.Abuse Chronic.Neurological.Dystrophies Chronic.Pain.Fibromyalgia Dementia Depression Developmental.Delay.Retardation Non.Adherence None Obesity Other.Substance.Abuse Schizophrenia.and.other.Psychiatric.Disorders Unsure (code, 001-009) (code, 030-041) (code, 042) (code, 047) (code, 050-059) (code, 062) (code, 070-079) (code, 110-118) (code, 120-129) (code, V26) (code, V42) (code, V43) (code, V44) (code, V45) (code, V46) (code, V49) (code, V50) (code, V53) (code, V54) (code, V55) (code, V58) (code, V59) (code, V60) (code, V62) (code, V63) (code, V64) (code, V65) (code, V66) (code, V69) (code, V70) (code, V85) (code, V87) (code, V88) random
subject_id md5
68 27572b36bd4c26c322f50cf65d095d16 Nursing/Other 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.432180
109 27d1f5907fa14b6702837a845f84c54e Nursing/Other 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.670607
3e0fff775cfb678fdfa06ece68ebfab5 Nursing/Other 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.620208
8efc0a2ff698b75ce183e3183c1bf204 Nursing/Other 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.027760
f5f69772c32f1b0ac05b7cf408f7a6db Discharge 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.595216

5 rows × 198 columns

In [134]:
test_frac = 0.3
X_test = fit_dat.loc[fit_dat['random'] < test_frac, feature_cols].values
X_train = fit_dat.loc[fit_dat['random'] >= test_frac, feature_cols].values
In [218]:
classifiers = {}
for cat in categories:
    print(cat)
    Y_test = fit_dat.loc[fit_dat['random'] < test_frac, cat].values
    Y_train = fit_dat.loc[fit_dat['random'] >= test_frac, cat].values    

    logreg = linear_model.LogisticRegressionCV() #class_weight={0: .05, 1: .95})
    logreg.fit(X_train, Y_train)    

    ranked_df = pd.DataFrame([{'icd9':i[0], 'weight': i[1]} for i in zip(feature_cols, logreg.coef_[0,:])]).\
      set_index('icd9')    
    ranked_df = code_lookup_df.join(ranked_df).sort_values('weight', ascending=False)
    display.display(ranked_df.head(10))

    Y_pred = logreg.predict_proba(X_test)[:, 1]
    
    [fpr, tpr, thresh] = metrics.roc_curve(Y_test, Y_pred)
    auc = metrics.auc(fpr, tpr)
    thresh_ind = np.abs(tpr-0.5).argmin()

    plt.plot(fpr, tpr)
    plt.plot(fpr[thresh_ind], tpr[thresh_ind], marker='.', markersize=10)
    plt.plot([0, 1],[0, 1],'k--')
    plt.grid(True)
    plt.axes().set_aspect('equal') 
    plt.title(cat)    
    plt.xlabel('Specificity (1-FPR)')
    plt.ylabel('Sensitivity (TPR)')

    fig_path = pl.Path(path_config['results_dir']).joinpath('{}_{}_log_reg_roc.png'.format(time_str, cat))
    print('Saving figure to {}'.format(fig_path))
    plt.savefig(fig_path.as_posix())
    
    plt.show()

    print('AUC = {}'.format(auc))
    print('0.5 Sensitivity Probability Threshold = {}'.format(thresh[thresh_ind]))    
    
    print('Confusion matrix:  [TN FP; FN, TP]')
    print(metrics.confusion_matrix(Y_test, Y_pred > thresh[thresh_ind]))
    print('----------------------------------')
    
    classifiers[cat] = {
        'classifier': logreg,
        'threshold': thresh[thresh_ind]
    }
Advanced.Cancer
descr weight
icd9
(code, 190-199) MALIGNANT NEOPLASM OF OTHER AND UNSPECIFIED SITES 2.526317
(code, V10) Personal history of malignant neoplasm 1.244528
(code, 160-165) MALIGNANT NEOPLASM OF RESPIRATORY AND INTRATHORACIC ORGANS 0.783067
(code, 235-238) NEOPLASMS OF UNCERTAIN BEHAVIOR 0.749679
(code, 150-159) MALIGNANT NEOPLASM OF DIGESTIVE ORGANS AND PERITONEUM 0.660028
(code, 510-519) OTHER DISEASES OF RESPIRATORY SYSTEM 0.545930
(code, V13) Personal history of other diseases 0.542035
(code, E930) Antibiotics 0.514473
(code, E933) Primarily systemic agents 0.493461
(code, 260-269) NUTRITIONAL DEFICIENCIES 0.492130
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Advanced.Cancer_log_reg_roc.png
AUC = 0.7756792577866136
0.5 Sensitivity Probability Threshold = 0.10394196524343723
Confusion matrix:  [TN FP; FN, TP]
[[479  24]
 [ 13  11]]
----------------------------------
Advanced.Heart.Disease
descr weight
icd9
(code, 420-429) OTHER FORMS OF HEART DISEASE 0.005062
(code, 410-414) ISCHEMIC HEART DISEASE 0.003937
(code, V45) Other postprocedural states 0.002815
(code, 580-589) NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS 0.001645
(code, 393-398) CHRONIC RHEUMATIC HEART DISEASE 0.001249
(code, 785) Symptoms involving cardiovascular system 0.000874
(code, 270-279) OTHER METABOLIC AND IMMUNITY DISORDERS 0.000752
(code, V58) Encounter for other and unspecified procedures and aftercare 0.000590
(code, V43) Organ or tissue replaced by other means 0.000558
(code, 240-246) DISORDERS OF THYROID GLAND 0.000547
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Advanced.Heart.Disease_log_reg_roc.png
AUC = 0.7463864306784661
0.5 Sensitivity Probability Threshold = 0.1389763286339258
Confusion matrix:  [TN FP; FN, TP]
[[348 104]
 [ 38  37]]
----------------------------------
Advanced.Lung.Disease
descr weight
icd9
(code, 730-739) OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES 1.608568
(code, 510-519) OTHER DISEASES OF RESPIRATORY SYSTEM 1.206375
(code, V46) Other dependence on machines and devices 1.174741
(code, 460-466) ACUTE RESPIRATORY INFECTIONS 0.765785
(code, 490-496) CHRONIC OBSTRUCTIVE PULMONARY DISEASE AND ALLIED CONDITIONS 0.735720
(code, 240-246) DISORDERS OF THYROID GLAND 0.610108
(code, 340-349) OTHER DISORDERS OF THE CENTRAL NERVOUS SYSTEM 0.580053
(code, V49) Other conditions influencing health status 0.565586
(code, V02) Carrier or suspected carrier of infectious diseases 0.533711
(code, V13) Personal history of other diseases 0.514750
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Advanced.Lung.Disease_log_reg_roc.png
AUC = 0.8227696216826652
0.5 Sensitivity Probability Threshold = 0.19510015367599318
Confusion matrix:  [TN FP; FN, TP]
[[438  45]
 [ 23  21]]
----------------------------------
Alcohol.Abuse
descr weight
icd9
(code, 570-579) OTHER DISEASES OF DIGESTIVE SYSTEM 1.163718
(code, 290-299) PSYCHOSES 1.160067
(code, V60) Housing, household, and economic circumstances 1.134736
(code, 789) Other symptoms involving abdomen and pelvis 0.949451
(code, 300-316) NEUROTIC DISORDERS, PERSONALITY DISORDERS, AND OTHER NONPSYCHOTIC MENTAL DISORDERS 0.786526
(code, 070-079) OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE 0.775466
(code, V08) Asymptomatic human immunodeficiency virus [HIV] infection status 0.665920
(code, V15) Other personal history presenting hazards to health 0.638283
(code, V11) Personal history of mental disorder 0.594422
(code, E888) Other and unspecified fall 0.565786
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Alcohol.Abuse_log_reg_roc.png
AUC = 0.7530608435983576
0.5 Sensitivity Probability Threshold = 0.15219003911941675
Confusion matrix:  [TN FP; FN, TP]
[[433  37]
 [ 30  27]]
----------------------------------
Chronic.Neurological.Dystrophies
descr weight
icd9
(code, 340-349) OTHER DISORDERS OF THE CENTRAL NERVOUS SYSTEM 0.002613
(code, 590-599) OTHER DISEASES OF URINARY SYSTEM 0.001986
(code, 780) General symptoms 0.001867
(code, 330-337) HEREDITARY AND DEGENERATIVE DISEASES OF THE CENTRAL NERVOUS SYSTEM 0.001600
(code, 240-246) DISORDERS OF THYROID GLAND 0.001534
(code, 350-359) DISORDERS OF THE PERIPHERAL NERVOUS SYSTEM 0.001523
(code, 430-438) CEREBROVASCULAR DISEASE 0.001221
(code, 500-508) PNEUMOCONIOSES AND OTHER LUNG DISEASES DUE TO EXTERNAL AGENTS 0.001212
(code, 700-709) OTHER DISEASES OF SKIN AND SUBCUTANEOUS TISSUE 0.001206
(code, 249-259) DISEASES OF OTHER ENDOCRINE GLANDS 0.001183
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Chronic.Neurological.Dystrophies_log_reg_roc.png
AUC = 0.7241262346684033
0.5 Sensitivity Probability Threshold = 0.14620750976262598
Confusion matrix:  [TN FP; FN, TP]
[[362  82]
 [ 43  40]]
----------------------------------
Chronic.Pain.Fibromyalgia
descr weight
icd9
(code, 270-279) OTHER METABOLIC AND IMMUNITY DISORDERS 0.001499
(code, 730-739) OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES 0.001358
(code, 030-041) OTHER BACTERIAL DISEASES 0.001335
(code, V58) Encounter for other and unspecified procedures and aftercare 0.001314
(code, 725-729) RHEUMATISM, EXCLUDING THE BACK 0.001144
(code, 580-589) NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS 0.001033
(code, 590-599) OTHER DISEASES OF URINARY SYSTEM 0.000974
(code, 710-719) ARTHROPATHIES AND RELATED DISORDERS 0.000923
(code, 530-538) DISEASES OF ESOPHAGUS, STOMACH, AND DUODENUM 0.000849
(code, 415-417) DISEASES OF PULMONARY CIRCULATION 0.000804
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Chronic.Pain.Fibromyalgia_log_reg_roc.png
AUC = 0.6382045539380365
0.5 Sensitivity Probability Threshold = 0.10599105530911747
Confusion matrix:  [TN FP; FN, TP]
[[357 113]
 [ 30  27]]
----------------------------------
Dementia
descr weight
icd9
(code, 290-299) PSYCHOSES 0.002410
(code, 330-337) HEREDITARY AND DEGENERATIVE DISEASES OF THE CENTRAL NERVOUS SYSTEM 0.000984
(code, 410-414) ISCHEMIC HEART DISEASE 0.000682
(code, V45) Other postprocedural states 0.000663
(code, 420-429) OTHER FORMS OF HEART DISEASE 0.000591
(code, 560-569) OTHER DISEASES OF INTESTINES AND PERITONEUM 0.000511
(code, 580-589) NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS 0.000466
(code, 590-599) OTHER DISEASES OF URINARY SYSTEM 0.000463
(code, 785) Symptoms involving cardiovascular system 0.000449
(code, 030-041) OTHER BACTERIAL DISEASES 0.000413
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Dementia_log_reg_roc.png
AUC = 0.757051282051282
0.5 Sensitivity Probability Threshold = 0.03856300219804359
Confusion matrix:  [TN FP; FN, TP]
[[459  48]
 [ 11   9]]
----------------------------------
Depression
descr weight
icd9
(code, 300-316) NEUROTIC DISORDERS, PERSONALITY DISORDERS, AND OTHER NONPSYCHOTIC MENTAL DISORDERS 0.003354
(code, 290-299) PSYCHOSES 0.001928
(code, 530-538) DISEASES OF ESOPHAGUS, STOMACH, AND DUODENUM 0.001889
(code, 070-079) OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE 0.001476
(code, 580-589) NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS 0.001269
(code, 350-359) DISORDERS OF THE PERIPHERAL NERVOUS SYSTEM 0.001053
(code, V45) Other postprocedural states 0.001042
(code, 730-739) OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES 0.000959
(code, V60) Housing, household, and economic circumstances 0.000855
(code, 490-496) CHRONIC OBSTRUCTIVE PULMONARY DISEASE AND ALLIED CONDITIONS 0.000694
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Depression_log_reg_roc.png
AUC = 0.6824449748077434
0.5 Sensitivity Probability Threshold = 0.16291146320902805
Confusion matrix:  [TN FP; FN, TP]
[[304 115]
 [ 55  53]]
----------------------------------
Developmental.Delay.Retardation
descr weight
icd9
(code, 317-319) MENTAL RETARDATION 4.468493
(code, 150-159) MALIGNANT NEOPLASM OF DIGESTIVE ORGANS AND PERITONEUM 1.593733
(code, E939) Psychotropic agents 1.532755
(code, 500-508) PNEUMOCONIOSES AND OTHER LUNG DISEASES DUE TO EXTERNAL AGENTS 1.451041
(code, 480-488) PNEUMONIA AND INFLUENZA 1.393855
(code, 780) General symptoms 1.289353
(code, 290-299) PSYCHOSES 1.198573
(code, 240-246) DISORDERS OF THYROID GLAND 1.193123
(code, 690-698) OTHER INFLAMMATORY CONDITIONS OF SKIN AND SUBCUTANEOUS TISSUE 1.041017
(code, 555-558) NONINFECTIOUS ENTERITIS AND COLITIS 0.975399
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Developmental.Delay.Retardation_log_reg_roc.png
AUC = 0.968956043956044
0.5 Sensitivity Probability Threshold = 0.29711871330799156
Confusion matrix:  [TN FP; FN, TP]
[[515   5]
 [  5   2]]
----------------------------------
Non.Adherence
descr weight
icd9
(code, 530-538) DISEASES OF ESOPHAGUS, STOMACH, AND DUODENUM 0.001452
(code, V15) Other personal history presenting hazards to health 0.001374
(code, 350-359) DISORDERS OF THE PERIPHERAL NERVOUS SYSTEM 0.001329
(code, 580-589) NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS 0.001171
(code, 360-379) DISORDERS OF THE EYE AND ADNEXA 0.000959
(code, 070-079) OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE 0.000900
(code, V58) Encounter for other and unspecified procedures and aftercare 0.000789
(code, 270-279) OTHER METABOLIC AND IMMUNITY DISORDERS 0.000754
(code, 300-316) NEUROTIC DISORDERS, PERSONALITY DISORDERS, AND OTHER NONPSYCHOTIC MENTAL DISORDERS 0.000565
(code, V60) Housing, household, and economic circumstances 0.000385
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Non.Adherence_log_reg_roc.png
AUC = 0.8049337957124844
0.5 Sensitivity Probability Threshold = 0.07632155947215179
Confusion matrix:  [TN FP; FN, TP]
[[447  41]
 [ 20  19]]
----------------------------------
None
descr weight
icd9
(code, V42) Organ or tissue replaced by transplant 0.210770
(code, 996-999) COMPLICATIONS OF SURGICAL AND MEDICAL CARE, NOT ELSEWHERE CLASSIFIED 0.199843
(code, 001-009) INTESTINAL INFECTIOUS DISEASES 0.121922
(code, 786) Symptoms involving respiratory system and other chest symptoms 0.121564
(code, 480-488) PNEUMONIA AND INFLUENZA 0.110942
(code, 440-449) DISEASES OF ARTERIES, ARTERIOLES, AND CAPILLARIES 0.107852
(code, 360-379) DISORDERS OF THE EYE AND ADNEXA 0.104099
(code, 420-429) OTHER FORMS OF HEART DISEASE 0.100312
(code, 788) Symptoms involving urinary system 0.099602
(code, 393-398) CHRONIC RHEUMATIC HEART DISEASE 0.095794
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_None_log_reg_roc.png
AUC = 0.6605450236966824
0.5 Sensitivity Probability Threshold = 0.6548677574348872
Confusion matrix:  [TN FP; FN, TP]
[[154  57]
 [164 152]]
----------------------------------
Obesity
descr weight
icd9
(code, 270-279) OTHER METABOLIC AND IMMUNITY DISORDERS 0.001250
(code, 700-709) OTHER DISEASES OF SKIN AND SUBCUTANEOUS TISSUE 0.001237
(code, 510-519) OTHER DISEASES OF RESPIRATORY SYSTEM 0.000885
(code, 415-417) DISEASES OF PULMONARY CIRCULATION 0.000885
(code, 327) ORGANIC SLEEP DISORDERS 0.000819
(code, 580-589) NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS 0.000713
(code, 680-686) INFECTIONS OF SKIN AND SUBCUTANEOUS TISSUE 0.000630
(code, 780) General symptoms 0.000586
(code, V58) Encounter for other and unspecified procedures and aftercare 0.000498
(code, 590-599) OTHER DISEASES OF URINARY SYSTEM 0.000498
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Obesity_log_reg_roc.png
AUC = 0.6649800796812749
0.5 Sensitivity Probability Threshold = 0.04417569663751994
Confusion matrix:  [TN FP; FN, TP]
[[387 115]
 [ 14  11]]
----------------------------------
Other.Substance.Abuse
descr weight
icd9
(code, 960-979) POISONING BY DRUGS, MEDICINAL AND BIOLOGICAL SUBSTANCES 2.868189
(code, 070-079) OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE 2.261325
(code, V60) Housing, household, and economic circumstances 2.162543
(code, E939) Psychotropic agents 1.824646
(code, E888) Other and unspecified fall 1.799809
(code, E935) Analgesics, antipyretics, and antirheumatics 1.577303
(code, 110-118) MYCOSES 1.473258
(code, 300-316) NEUROTIC DISORDERS, PERSONALITY DISORDERS, AND OTHER NONPSYCHOTIC MENTAL DISORDERS 1.347367
(code, E928) Other and unspecified environmental and accidental causes 1.288913
(code, E854) Accidental poisoning by other psychotropic agents 1.189443
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Other.Substance.Abuse_log_reg_roc.png
AUC = 0.8683927932227659
0.5 Sensitivity Probability Threshold = 0.33163980391797443
Confusion matrix:  [TN FP; FN, TP]
[[477  16]
 [ 18  16]]
----------------------------------
Schizophrenia.and.other.Psychiatric.Disorders
descr weight
icd9
(code, 290-299) PSYCHOSES 0.003059
(code, 300-316) NEUROTIC DISORDERS, PERSONALITY DISORDERS, AND OTHER NONPSYCHOTIC MENTAL DISORDERS 0.002363
(code, 730-739) OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES 0.001407
(code, 340-349) OTHER DISORDERS OF THE CENTRAL NERVOUS SYSTEM 0.001083
(code, 070-079) OTHER DISEASES DUE TO VIRUSES AND CHLAMYDIAE 0.001064
(code, 240-246) DISORDERS OF THYROID GLAND 0.000945
(code, V60) Housing, household, and economic circumstances 0.000929
(code, 030-041) OTHER BACTERIAL DISEASES 0.000914
(code, 270-279) OTHER METABOLIC AND IMMUNITY DISORDERS 0.000852
(code, 787) Symptoms involving digestive system 0.000769
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Schizophrenia.and.other.Psychiatric.Disorders_log_reg_roc.png
AUC = 0.7535087719298247
0.5 Sensitivity Probability Threshold = 0.11327678496224783
Confusion matrix:  [TN FP; FN, TP]
[[394  76]
 [ 30  27]]
----------------------------------
Unsure
descr weight
icd9
(code, 420-429) OTHER FORMS OF HEART DISEASE 0.001652
(code, 270-279) OTHER METABOLIC AND IMMUNITY DISORDERS 0.001351
(code, 580-589) NEPHRITIS, NEPHROTIC SYNDROME, AND NEPHROSIS 0.001012
(code, V58) Encounter for other and unspecified procedures and aftercare 0.000998
(code, 730-739) OSTEOPATHIES, CHONDROPATHIES, AND ACQUIRED MUSCULOSKELETAL DEFORMITIES 0.000880
(code, 440-449) DISEASES OF ARTERIES, ARTERIOLES, AND CAPILLARIES 0.000816
(code, 790) Nonspecific findings on examination of blood 0.000695
(code, 393-398) CHRONIC RHEUMATIC HEART DISEASE 0.000681
(code, 780) General symptoms 0.000679
(code, 240-246) DISORDERS OF THYROID GLAND 0.000676
Saving figure to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_Unsure_log_reg_roc.png
AUC = 0.5328101155439284
0.5 Sensitivity Probability Threshold = 0.18846364109275968
Confusion matrix:  [TN FP; FN, TP]
[[213 204]
 [ 56  54]]
----------------------------------
In [202]:
list(path_config.keys())
Out[202]:
['log_dir', 'input_dir', 'results_dir', 'repo_data_dir']
In [213]:
class_dat = {
    'classifiers': classifiers,
    'features': feature_cols
}
clf_path = pl.Path(path_config['results_dir']).joinpath('{}_icd9_log_reg.pkl'.format(time_str))
print('Saving classifiers to {}'.format(clf_path))
with open(clf_path.as_posix(), 'wb') as f:
    pkl.dump(class_dat, f)
Saving classifiers to /mnt/cbds_homes/ecarlson/results/mit_frequent_fliers/2016-10-24-20-30_icd9_log_reg.pkl
In [ ]:
 


Comments

comments powered by Disqus