{"version":"https://jsonfeed.org/version/1.1","title":"苏迟但到的主页","home_page_url":"https://kexohproject.pages.dev","feed_url":"https://kexohproject.pages.dev/json/","description":"<p>你好，欢迎访问个人主页！</p><p>擅长密码学，安全分析，数字水印等技术。</p><p>你可以联系我通过:findmykexin@gmail.com或者知乎私信。</p><p>我的知乎链接：<a href=\"https://www.zhihu.com/people/su-chi-dan-dao\" rel=\"noopener noreferrer\" target=\"_blank\">苏迟但到 - 知乎 (zhihu.com)</a></p><p>我的github链接：<a href=\"https://github.com/kexinoh\" rel=\"noopener noreferrer\" target=\"_blank\">kexinoh</a></p>","icon":"https://kexohcdn.gptapi.cyou/kexohproject/production/images/channel-2e54d141ee195646ca12a9d16507a908.jpg","favicon":"https://kexohcdn.gptapi.cyou/kexohproject/production/images/favicon-340a2925d02a0386f3b954a032834917.jpg","authors":[{"name":"苏迟但到"}],"language":"zh-cn","items":[{"id":"gnz2wf_pXwB","title":"为什么训练数据量少时，会出现过拟合现象？过拟合现象的根源是什么？","content_html":"<p></p><p data-pid=\"W_Kv_A41\">下文面向有一定深度学习背景的人。</p><p data-pid=\"gwoRdALN\">我们要搞清楚神经网络的本质是函数的拟合，且我们的手段是使得拟合出的曲线与样本集的偏差尽可能小。</p><figure data-size=\"normal\"><noscript><img src=\"https://pic2.zhimg.com/v2-6f89c20b9fe763292fa7eab084f06271_b.jpg\" data-size=\"normal\" data-rawwidth=\"1646\" data-rawheight=\"976\" data-original-token=\"v2-53a584d8210dec4c1eb6deac8a18054f\" data-default-watermark-src=\"https://pic3.zhimg.com/v2-47092ff19c18fba9aa9ea7969836367e_b.jpg\" class=\"origin_image zh-lightbox-thumb\" width=\"1646\" data-original=\"https://pic2.zhimg.com/v2-6f89c20b9fe763292fa7eab084f06271_r.jpg\"/></noscript><img src=\"data:image/svg+xml;utf8,&lt;svg xmlns=&#39;http://www.w3.org/2000/svg&#39; width=&#39;1646&#39; height=&#39;976&#39;&gt;&lt;/svg&gt;\" data-size=\"normal\" data-rawwidth=\"1646\" data-rawheight=\"976\" data-original-token=\"v2-53a584d8210dec4c1eb6deac8a18054f\" data-default-watermark-src=\"https://pic3.zhimg.com/v2-47092ff19c18fba9aa9ea7969836367e_b.jpg\" class=\"origin_image zh-lightbox-thumb lazy\" width=\"1646\" data-original=\"https://pic2.zhimg.com/v2-6f89c20b9fe763292fa7eab084f06271_r.jpg\" data-actualsrc=\"https://pic2.zhimg.com/v2-6f89c20b9fe763292fa7eab084f06271_b.jpg\"/><figcaption>自然曲线</figcaption></figure><figure data-size=\"normal\"><noscript><img src=\"https://pic3.zhimg.com/v2-a7cc8457377b1053b06f1d09d3b55c62_b.jpg\" data-size=\"normal\" data-rawwidth=\"1506\" data-rawheight=\"904\" data-original-token=\"v2-a649be9e88dda9f390821d03cdaca6f1\" data-default-watermark-src=\"https://pic2.zhimg.com/v2-6d8ea16fb5d3b7a83b31544017136111_b.jpg\" class=\"origin_image zh-lightbox-thumb\" width=\"1506\" data-original=\"https://pic3.zhimg.com/v2-a7cc8457377b1053b06f1d09d3b55c62_r.jpg\"/></noscript><img src=\"data:image/svg+xml;utf8,&lt;svg xmlns=&#39;http://www.w3.org/2000/svg&#39; width=&#39;1506&#39; height=&#39;904&#39;&gt;&lt;/svg&gt;\" data-size=\"normal\" data-rawwidth=\"1506\" data-rawheight=\"904\" data-original-token=\"v2-a649be9e88dda9f390821d03cdaca6f1\" data-default-watermark-src=\"https://pic2.zhimg.com/v2-6d8ea16fb5d3b7a83b31544017136111_b.jpg\" class=\"origin_image zh-lightbox-thumb lazy\" width=\"1506\" data-original=\"https://pic3.zhimg.com/v2-a7cc8457377b1053b06f1d09d3b55c62_r.jpg\" data-actualsrc=\"https://pic3.zhimg.com/v2-a7cc8457377b1053b06f1d09d3b55c62_b.jpg\"/><figcaption>过拟合曲线</figcaption></figure><p data-pid=\"0MMpEiRy\">当样本集过小，会导致样本集的采样不均匀，从而生成曲线与真实曲线产生偏差导致了过拟合。</p><p data-pid=\"-2UEudxw\">所以根源是样本集的采样不均匀。</p><p data-pid=\"h-9Z6fuP\">而样本集大，就一定意味着采样均匀吗？不是的。</p><p data-pid=\"oyasKHkz\">但是我们可以依靠大数定律和概率论来证明样本集越小，越容易产生采样不均匀的事件产生。</p><p data-pid=\"DRrOZ6vW\">我们如果已经知道了最优的预测模型(或者你可以把测试集作为训练集)，那么我们只需要用最优预测模型反向挑选从而可以构建一个很小的样本集合，但是却不会产生过拟合。</p><p data-pid=\"gBv8dWXc\">但是问题是，我们不可能在没有训练完成之前获取最优预测模型，所以我们在初期挑选的时候要尽可能的多采样，避免不均匀。</p>","content_text":"下文面向有一定深度学习背景的人。\n\n我们要搞清楚神经网络的本质是函数的拟合，且我们的手段是使得拟合出的曲线与样本集的偏差尽可能小。\n\n[https://pic2.zhimg.com/v2-6f89c20b9fe763292fa7eab084f06271_b.jpg][data:image/svg+xml;utf8,&lt;svg\nxmlns=&#39;http://www.w3.org/2000/svg&#39; width=&#39;1646&#39;\nheight=&#39;976&#39;&gt;&lt;/svg&gt;]自然曲线[https://pic3.zhimg.com/v2-a7cc8457377b1053b06f1d09d3b55c62_b.jpg][data:image/svg+xml;utf8,&lt;svg\nxmlns=&#39;http://www.w3.org/2000/svg&#39; width=&#39;1506&#39;\nheight=&#39;904&#39;&gt;&lt;/svg&gt;]过拟合曲线\n\n当样本集过小，会导致样本集的采样不均匀，从而生成曲线与真实曲线产生偏差导致了过拟合。\n\n所以根源是样本集的采样不均匀。\n\n而样本集大，就一定意味着采样均匀吗？不是的。\n\n但是我们可以依靠大数定律和概率论来证明样本集越小，越容易产生采样不均匀的事件产生。\n\n我们如果已经知道了最优的预测模型(或者你可以把测试集作为训练集)，那么我们只需要用最优预测模型反向挑选从而可以构建一个很小的样本集合，但是却不会产生过拟合。\n\n但是问题是，我们不可能在没有训练完成之前获取最优预测模型，所以我们在初期挑选的时候要尽可能的多采样，避免不均匀。","date_published":"2023-10-20T15:09:54.000Z","_microfeed":{"web_url":"https://kexohproject.pages.dev/i/为什么训练数据量少时-会出现过拟合现象-过拟合现象的根源是什么-gnz2wf_pXwB/","json_url":"https://kexohproject.pages.dev/i/gnz2wf_pXwB/json/","rss_url":"https://kexohproject.pages.dev/i/gnz2wf_pXwB/rss/","guid":"gnz2wf_pXwB","status":"published","itunes:title":"New Article Title for iTunes","date_published_short":"Fri Oct 20 2023","date_published_ms":1697814594000}}],"_microfeed":{"microfeed_version":"0.1.2","base_url":"https://kexohproject.pages.dev","categories":[{"name":"Education","categories":[{"name":"Language Learning"}]},{"name":"Technology"}],"subscribe_methods":[{"name":"RSS","type":"rss","url":"https://kexohproject.pages.dev/rss/","image":"https://kexohproject.pages.dev/assets/brands/subscribe/rss.png","enabled":true,"editable":false,"id":"4KlfbtkEfzy"},{"name":"JSON","type":"json","url":"https://kexohproject.pages.dev/json/","image":"https://kexohproject.pages.dev/assets/brands/subscribe/json.png","enabled":true,"editable":false,"id":"DVFm7TYiNSq"}],"description_text":"你好，欢迎访问个人主页！\n\n擅长密码学，安全分析，数字水印等技术。\n\n你可以联系我通过:findmykexin@gmail.com或者知乎私信。\n\n我的知乎链接：苏迟但到 - 知乎 (zhihu.com)\n\n我的github链接：kexinoh","copyright":"©2024","itunes:type":"episodic","items_sort_order":"newest_first"}}