{"id":866,"date":"2024-02-07T11:22:57","date_gmt":"2024-02-07T11:22:57","guid":{"rendered":"https:\/\/serhatdiker.com\/?p=866"},"modified":"2025-05-10T11:19:41","modified_gmt":"2025-05-10T11:19:41","slug":"diyabet-tahmini-uygulamasi","status":"publish","type":"post","link":"https:\/\/serhatdiker.com\/index.php\/2024\/02\/07\/diyabet-tahmini-uygulamasi\/","title":{"rendered":"Diyabet Tahmini Uygulamas\u0131: Lojistik Regresyon &amp; ML\u200b"},"content":{"rendered":"<style>\/*! elementor - v3.19.0 - 05-02-2024 *\/\n.elementor-heading-title{padding:0;margin:0;line-height:1}.elementor-widget-heading .elementor-heading-title[class*=elementor-size-]>a{color:inherit;font-size:inherit;line-height:inherit}.elementor-widget-heading .elementor-heading-title.elementor-size-small{font-size:15px}.elementor-widget-heading .elementor-heading-title.elementor-size-medium{font-size:19px}.elementor-widget-heading .elementor-heading-title.elementor-size-large{font-size:29px}.elementor-widget-heading .elementor-heading-title.elementor-size-xl{font-size:39px}.elementor-widget-heading .elementor-heading-title.elementor-size-xxl{font-size:59px}<\/style>\n<h2>Diyabet Tahmini Uygulamas\u0131: Lojistik Regresyon &amp; ML<\/h2>\n<p><strong>Kullan\u0131lan Veri Seti:<\/strong><\/p>\n<p><a href=\"https:\/\/www.kaggle.com\/datasets\/iammustafatz\/diabetes-prediction-dataset\">Kaggle \u00abDiabetes prediction dataset\u00bb veri seti.<\/a><br \/>100000 sat\u0131r veri<\/p>\n<p>(Veri Seti: <a href=\"https:\/\/serhatdiker.com\/wp-content\/uploads\/2024\/02\/diabetes_prediction_dataset.csv\">diabetes_prediction_dataset<\/a>)<\/p>\n<p>Veri seti alanlar\u0131;<\/p>\n<p>gender: Cinsiyet<br \/>age: Ya\u015f<br \/>hypertension: Hipertansiyon<br \/>heart_disease: Kalp Hastal\u0131\u011f\u0131<br \/>smoking_history: Sigara Kullanma Ge\u00e7mi\u015fi<br \/>bmi: V\u00fccut Kitle \u0130ndeksi<br \/>HbA1c_level: HbA1c Seviyesi (Glikozile Hemoglobin Seviyesi)<br \/>blood_glucose_level: Kan \u015eeker Seviyesi<br \/>diabetes: Diyabet<\/p>\n<p><strong>Kullan\u0131lan K\u00fct\u00fcphaneler;<\/strong><\/p>\n<p><strong>pandas:<\/strong> Veri manip\u00fclasyonu ve analizi i\u00e7in kullan\u0131l\u0131r. Veri setini okuma, veri \u00e7er\u00e7eveleri olu\u015fturma ve veri \u00f6ni\u015fleme g\u00f6revleri i\u00e7in esast\u0131r.<br \/><strong>numpy:<\/strong> Say\u0131sal hesaplamalar i\u00e7in kullan\u0131l\u0131r. Ayr\u0131ca matris i\u015flemleri ve matematiksel fonksiyonlar\u0131n uygulanmas\u0131 i\u00e7in kullan\u0131l\u0131r.<br \/><strong>sklearn (Scikit-learn):<\/strong> Makine \u00f6\u011frenimi algoritmalar\u0131n\u0131 uygulamak i\u00e7in kullan\u0131l\u0131r. Ayr\u0131ca veri \u00f6n i\u015fleme, model se\u00e7imi ve de\u011ferlendirme i\u00e7in gerekli ara\u00e7lar\u0131 da i\u00e7erir.<br \/><strong>sklearn.model_selection:<\/strong> Veri setini e\u011fitim ve test setlerine ay\u0131rmak i\u00e7in kullan\u0131l\u0131r.<br \/><strong>sklearn.linear_model:<\/strong> Lojistik regresyon gibi do\u011frusal modelleri i\u00e7erir.<br \/><strong>sklearn.metrics:<\/strong> Model performans metriklerini hesaplamak i\u00e7in kullan\u0131l\u0131r.<br \/><strong>matplotlib:<\/strong> Veri g\u00f6rselle\u015ftirmesi ve grafik \u00e7izimi i\u00e7in kullan\u0131l\u0131r.<br \/><strong>seaborn:<\/strong> Matplotlib temel al\u0131narak olu\u015fturulmu\u015f bir veri g\u00f6rselle\u015ftirme k\u00fct\u00fcphanesidir. Daha estetik ve bilgilendirici grafikler olu\u015fturmak i\u00e7in kullan\u0131l\u0131r.<\/p>\n<p>Python uygulama kodu;<br \/>\n&nbsp;<\/p>\n<pre lang=\u201dpython\u201d escaped=\u201dtrue\u201d>\nimport pandas as pd\nimport numpy as np\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.metrics import confusion_matrix, roc_curve, auc, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score\nimport matplotlib.pyplot as plt\nimport seaborn as sns\ndata = pd.read_csv(\"diabetes_prediction_dataset.csv\")\ndata = pd.get_dummies(data, columns=[\"gender\", \"smoking_history\"], drop_first=True)\nX = data.drop(\"diabetes\", axis=1)\ny = data[\"diabetes\"]\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\nmodel = LogisticRegression(max_iter=1000) model.fit(X_train, y_train)\ny_pred = model.predict(X_test)\naccuracy = accuracy_score(y_test, y_pred)\nprecision = precision_score(y_test, y_pred)\nrecall = recall_score(y_test, y_pred)\nf1 = f1_score(y_test, y_pred)\nroc_auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])\nprint(\"Accuracy (Do\u011fruluk):\", accuracy)\nprint(\"Precision (Kesinlik):\", precision)\nprint(\"Recall (Duyarl\u0131l\u0131k):\", recall)\nprint(\"F1 Score:\", f1)\nprint(\"ROC AUC Score (ROC E\u011frisi):\", roc_auc)\ncm = confusion_matrix(y_test, y_pred)\nplt.figure(figsize=(8, 6))\nsns.heatmap(cm, annot=True, fmt=\"d\", cmap='Blues')\nclass_names = [0, 1] \u000btick_marks = np.arange(len(class_names))\nplt.xticks(tick_marks, class_names)\nplt.yticks(tick_marks, class_names)\nplt.text(0,0, \"Do\u011fru Negatif\", va='center', ha='center')\nplt.text(0,1, \"Yanl\u0131\u015f Pozitif\", va='center', ha='center')\nplt.text(1,0, \"Yanl\u0131\u015f Negatif\", va='center', ha='center')\nplt.text(1,1, \"Do\u011fru Pozitif\", va='center', ha='center')\nplt.title('Confusion Matrix')\nplt.ylabel('True Label')\nplt.xlabel('Predicted Label')\nplt.show()\ny_pred_proba = model.predict_proba(X_test)[::,1]\nfpr, tpr, _ = roc_curve(y_test,  y_pred_proba)\nauc = auc(fpr, tpr)\nplt.figure(figsize=(8, 6))\nplt.plot(fpr, tpr, label=\"AUC=\"+str(auc))\nplt.xlabel('False Positive Rate')\nplt.ylabel('True Positive Rate')\nplt.title('ROC Curve')\nplt.legend(loc=4)\nplt.show()\n<\/pre>\n<p><strong>Sonu\u00e7lar\u0131n \u00c7\u0131kt\u0131s\u0131;<\/strong><\/p>\n<p>Accuracy (Do\u011fruluk): 0.959<br \/>Precision (Kesinlik): 0.8633387888707038<br \/>Recall (Duyarl\u0131l\u0131k): 0.6176814988290398<br \/>F1 Score: 0.720136518771331<br \/>ROC AUC Score (ROC E\u011frisi): 0.9616922634432529<\/p>\n<p>\u00a0<\/p>\n<p><strong>Sonu\u00e7lar\u0131n Yorumlanmas\u0131;<\/strong><\/p>\n<p><strong>Do\u011fruluk:<\/strong> %95.9. Bu, modelimizin test veri seti \u00fczerinde tahminlerin yakla\u015f\u0131k %95.9&#8217;unu do\u011fru yapt\u0131\u011f\u0131 anlam\u0131na gelir. Bu olduk\u00e7a y\u00fcksek bir orand\u0131r ve genellikle modelimizin iyi \u00e7al\u0131\u015ft\u0131\u011f\u0131n\u0131 g\u00f6sterir. Ancak dikkatli olmal\u0131y\u0131z, \u00e7\u00fcnk\u00fc e\u011fer s\u0131n\u0131flar dengesiz da\u011f\u0131lm\u0131\u015f olabilir (\u00f6rn. \u00e7ok fazla negatif \u00f6rnek ve \u00e7ok az pozitif \u00f6rne\u011fimiz var) do\u011fruluk yan\u0131lt\u0131c\u0131 olabilir.<br \/><strong>Kesinlik:<\/strong> %86.3. Modelimizin tahmin etti\u011fi pozitif vakalar\u0131n %86.3&#8217;\u00fc ger\u00e7ekten pozitif. Bu oran da olduk\u00e7a y\u00fcksek ve modelimizin yanl\u0131\u015f pozitif tahminler yapma olas\u0131l\u0131\u011f\u0131n\u0131n d\u00fc\u015f\u00fck oldu\u011funu g\u00f6sterir.<br \/><strong>Duyarl\u0131l\u0131k:<\/strong> %61.8. Ger\u00e7ek pozitif vakalar\u0131n %61.8&#8217;i model taraf\u0131ndan do\u011fru bir \u015fekilde tespit edilmi\u015f. Bu oran, kesinlikle k\u0131yasland\u0131\u011f\u0131nda biraz daha d\u00fc\u015f\u00fckt\u00fcr, bu da modelimizin baz\u0131 pozitif vakalar\u0131 ka\u00e7\u0131rd\u0131\u011f\u0131 anlam\u0131na gelir.<br \/><strong>F1 Skoru:<\/strong> %72. Modelimizin kesinlik ve duyarl\u0131l\u0131k aras\u0131nda orta d\u00fczeyde bir dengeye sahip oldu\u011funu g\u00f6sterir. Bu, modelimizin yanl\u0131\u015f pozitifleri azalt\u0131rken baz\u0131 ger\u00e7ek pozitifleri ka\u00e7\u0131rd\u0131\u011f\u0131 anlam\u0131na gelir.<br \/><strong>ROC AUC:<\/strong> %96.2. Bu de\u011fer, modelimizin genel s\u0131n\u0131fland\u0131rma performans\u0131n\u0131n m\u00fckemmel oldu\u011funu g\u00f6sterir. AUC, bir s\u0131n\u0131fland\u0131rma modelinin rastgele se\u00e7ilen bir pozitif \u00f6rne\u011fi rastgele se\u00e7ilen bir negatif \u00f6rne\u011finden daha y\u00fcksek bir olas\u0131l\u0131\u011fa sahip oldu\u011funu do\u011fru bir \u015fekilde s\u0131ralama olas\u0131l\u0131\u011f\u0131n\u0131 g\u00f6sterir.<\/p>\n<p><strong>Yorum:<\/strong><br \/>Genel olarak modelimizin iyi bir performans sergiledi\u011fini s\u00f6yleyebiliriz. ROC AUC ve Do\u011fruluk de\u011ferleri olduk\u00e7a y\u00fcksek. Ancak, duyarl\u0131l\u0131k de\u011feri kesinlikle k\u0131yasland\u0131\u011f\u0131nda biraz d\u00fc\u015f\u00fck, bu da modelimizin baz\u0131 ger\u00e7ek pozitif vakalar\u0131 ka\u00e7\u0131rd\u0131\u011f\u0131n\u0131 g\u00f6steriyor. Bu, t\u0131bbi testlerde \u00f6zellikle \u00f6nemlidir, \u00e7\u00fcnk\u00fc ger\u00e7ek pozitif vakalar\u0131n (hastalar\u0131n) ka\u00e7\u0131r\u0131lmas\u0131 istenmeyen bir durumdur. Modelimizin yanl\u0131\u015f negatifleri (ger\u00e7ekte pozitif olan ama negatif olarak tahmin edilen vakalar) azaltma yetene\u011fini art\u0131rmam\u0131z gerekebilir. Bunun i\u00e7in model hiperparametrelerini ayarlamay\u0131 veya farkl\u0131 \u00f6znitelik m\u00fchendisli\u011fi teknikleri uygulamay\u0131 d\u00fc\u015f\u00fcnebiliriz.<\/p>\n<p><strong>Kar\u0131\u015f\u0131kl\u0131k Matrisi \u00c7\u0131kt\u0131s\u0131<\/strong><\/p>\n<style>\/*! elementor - v3.19.0 - 05-02-2024 *\/\n.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=\".svg\"]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block}<\/style>\n<p>\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"675\" height=\"506\" src=\"https:\/\/serhatdiker.com\/wp-content\/uploads\/2024\/02\/Resim10.png\" alt=\"\" srcset=\"https:\/\/serhatdiker.com\/wp-content\/uploads\/2024\/02\/Resim10.png 675w, https:\/\/serhatdiker.com\/wp-content\/uploads\/2024\/02\/Resim10-300x225.png 300w\" sizes=\"auto, (max-width: 675px) 100vw, 675px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t<\/p>\n<p><strong>DN (Do\u011fru Negatif):<\/strong> Ger\u00e7ekte 0 olan ve model taraf\u0131ndan 0 olarak tahmin edilen \u00f6rneklerin say\u0131s\u0131.<\/p>\n<p><strong>DP (Do\u011fru Pozitif):<\/strong> Ger\u00e7ekte 1 olan ve model taraf\u0131ndan 1 olarak tahmin edilen \u00f6rneklerin say\u0131s\u0131.<\/p>\n<p><strong>YN (Yanl\u0131\u015f Negatif):<\/strong> Ger\u00e7ekte 1 olan fakat model taraf\u0131ndan 0 olarak tahmin edilen \u00f6rneklerin say\u0131s\u0131.<\/p>\n<p><strong>YP (Yanl\u0131\u015f Pozitif):<\/strong> Ger\u00e7ekte 0 olan fakat model taraf\u0131ndan 1 olarak tahmin edilen \u00f6rneklerin say\u0131s\u0131.<\/p>\n<p><strong>Do\u011fru Tahminler:<\/strong> Model, genellikle do\u011fru tahminlerde bulunuyor. \u00d6zellikle ger\u00e7ekte 0 olan verileri tahmin ederken olduk\u00e7a ba\u015far\u0131l\u0131.<\/p>\n<p><strong>Hassasiyet (Recall\/Duyarl\u0131l\u0131k) ile \u0130lgili Sorun:<\/strong> Model, ger\u00e7ekte 1 olan verilerin bir k\u0131sm\u0131n\u0131 yanl\u0131\u015fl\u0131kla 0 olarak tahmin ediyor (167 adet YN). Bu, modelin baz\u0131 diyabet vakalar\u0131n\u0131 ka\u00e7\u0131rd\u0131\u011f\u0131 anlam\u0131na gelir.<\/p>\n<p><strong>Yanl\u0131\u015f Alarm (False Alarm) Oran\u0131:<\/strong> Model, ger\u00e7ekte diyabeti olmayan 653 ki\u015fiyi yanl\u0131\u015fl\u0131kla diyabetli olarak s\u0131n\u0131fland\u0131rm\u0131\u015f. Bu, modelin bazen a\u015f\u0131r\u0131 duyarl\u0131 olabilece\u011fini g\u00f6steriyor.<\/p>\n<p><strong>Diyabeti Do\u011fru Tespit Etme:<\/strong> Model, ger\u00e7ekten diyabeti olan 1055 ki\u015fiyi do\u011fru bir \u015fekilde tespit etmi\u015f.<\/p>\n<p><strong>ROC E\u011frisi \u00c7\u0131kt\u0131s\u0131<\/strong><\/p>\n<p>\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"675\" height=\"506\" src=\"https:\/\/serhatdiker.com\/wp-content\/uploads\/2024\/02\/Resim11.png\" alt=\"\" srcset=\"https:\/\/serhatdiker.com\/wp-content\/uploads\/2024\/02\/Resim11.png 675w, https:\/\/serhatdiker.com\/wp-content\/uploads\/2024\/02\/Resim11-300x225.png 300w\" sizes=\"auto, (max-width: 675px) 100vw, 675px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t<\/p>\n<p>ROC E\u011frisi, s\u0131n\u0131fland\u0131r\u0131c\u0131lar\u0131n performans\u0131n\u0131 de\u011ferlendirmek i\u00e7in kullan\u0131lan grafiksel bir y\u00f6ntemdir. Bu e\u011fri, Do\u011fru Pozitif Oran\u0131 (Duyarl\u0131l\u0131k) ile Yanl\u0131\u015f Pozitif Oran\u0131 aras\u0131ndaki ili\u015fkiyi g\u00f6sterir.<\/p>\n<p><strong>True Positive Rate (TPR) &#8211; Duyarl\u0131l\u0131k (Hassasiyet):<\/strong> Do\u011fru pozitif olarak s\u0131n\u0131fland\u0131r\u0131lan \u00f6rneklerin ger\u00e7ekten pozitif olan \u00f6rnekler aras\u0131ndaki oran\u0131d\u0131r.<\/p>\n<p>True Positive Rate = DP\/(DP+YN)\u200b<\/p>\n<p><strong>False Positive Rate (FPR) &#8211; Yanl\u0131\u015f Pozitif Oran\u0131:<\/strong> Yanl\u0131\u015f pozitif olarak s\u0131n\u0131fland\u0131r\u0131lan \u00f6rneklerin ger\u00e7ekte negatif olan \u00f6rnekler aras\u0131ndaki oran\u0131d\u0131r.<\/p>\n<p>False Positive Rate = YP\/(YP+DN)<\/p>\n<p>E\u011fer bir ROC e\u011frisi sol \u00fcst k\u00f6\u015feye yak\u0131nsa (yani e\u011fri y eksenine yak\u0131n), modelin performans\u0131 iyidir. E\u011fer e\u011fri, y=x \u00e7izgisine yak\u0131nsa (45 derecelik \u00e7izgi), modelin performans\u0131 rastgele tahmin etmekten farks\u0131zd\u0131r.<\/p>\n<p>ROC AUC (Area Under the Curve) Skoru, ROC e\u011frisinin alt\u0131ndaki alan\u0131 temsil eder ve s\u0131n\u0131fland\u0131r\u0131c\u0131n\u0131n performans\u0131n\u0131 \u00f6zetleyen tek bir de\u011ferdir. ROC AUC skoru 1&#8217;e yak\u0131n oldu\u011funda, modelin performans\u0131 m\u00fckemmeldir. E\u011fer ROC AUC skoru 0.5&#8217;e yak\u0131nsa, model rastgele tahmin etmekten farks\u0131zd\u0131r.<\/p>\n<p>Y\u00fcksek Lisans Derslerimden Biyoistatistik Uygulamalar\u0131 dersinde haz\u0131rlad\u0131\u011f\u0131m \u00d6zel Konu Projemi payla\u015ft\u0131m. Dersi ald\u0131\u011f\u0131m Prof. Dr. Filiz KARAMAN hocama sayg\u0131 ve te\u015fekk\u00fcrlerimle.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Diyabet Tahmini Uygulamas\u0131: Lojistik Regresyon &amp; ML Kullan\u0131lan Veri Seti: Kaggle \u00abDiabetes prediction dataset\u00bb veri seti.100000 sat\u0131r veri (Veri Seti: diabetes_prediction_dataset) Veri seti alanlar\u0131; gender: Cinsiyetage:&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":984,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[27,210,221,205,33,219,206,30,207,29,220,32,214,209,218,222,217,211,213,215,208,216,212,31,49,28],"class_list":["post-866","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-projects","tag-biyoistatistik","tag-confusion-matrix","tag-diabetes-ml-modeli","tag-diabetes-prediction","tag-diyabet-tahmini","tag-f1-skoru","tag-logistic-regression","tag-lojistik-regresyon","tag-machine-learning","tag-makine-ogrenmesi","tag-medical-data-analysis","tag-python","tag-python-ml","tag-roc-auc","tag-roc-egrisi","tag-saglik-istatistikleri","tag-saglik-tahmin-modeli","tag-saglik-verisi-analizi","tag-saglikta-yapay-zeka","tag-scikit-learn","tag-sklearn","tag-tibbi-siniflandirma","tag-tibbi-veri-bilimi","tag-veri-bilimi","tag-veri-gorsellestirme","tag-yapay-zeka"],"_links":{"self":[{"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/posts\/866","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/comments?post=866"}],"version-history":[{"count":8,"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/posts\/866\/revisions"}],"predecessor-version":[{"id":884,"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/posts\/866\/revisions\/884"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/media\/984"}],"wp:attachment":[{"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/media?parent=866"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/categories?post=866"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/serhatdiker.com\/index.php\/wp-json\/wp\/v2\/tags?post=866"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}