검색 상세

지상 관측 자료를 이용한 시정 회귀 모델 구축

Building a visibility regression model using ground observation data

초록/요약 도움말

한반도는 봄과 겨울철 고농도 에어로졸에 의해 빈번히 연무가 발생한다. 그러나 에어로 졸에 기인하는 시정을 추정하는 연구는 부족한 실정이다. 따라서 이 연구에서는 봄과 겨울 철 기상 요소와 PM 농도 자료를 랜덤 포레스트에 적용해 시정 회귀 모델을 구축하고 검증 을 수행했다. 시정 회귀 모델을 구축할 때 훈련용 자료의 경우 Under-sampling을 수행했다. 또한 랜덤 포레스트 회귀 모델의 우수성을 검증하기 위해 같은 자료를 이용해 다중 선형 회 귀 모델을 구축하고 비교 검증을 수행했다. 시정 회귀 모델은 Under-sampling 이후 낮은 시정에 대한 정확도가 상승함을 확인했다. 또한 두 회귀 모델을 비교했을 때 랜덤 포레스트 모델이 다중 선형 회귀 모델보다 더 나은 검증 지수를 보여 랜덤 포레스트 회귀 모델의 우 수성을 확인했다. 지점별 검증에서는 대부분 지점에서 회귀 모델의 안정성을 확인했으나 일 부 지점의 경우 좋지 않은 검증 지수를 보였다. 그 원인에 대하여 분석한 결과 일부 지점에 서는 관측 시정과 상대 습도 그리고 PM2.5간의 상관성이 매우 낮게 분석됐다. 따라서 일부 지점에서는 Collocation의 문제 또는 저 품질의 관측 자료로 인해 회귀 모델의 성능이 저하 된 것으로 판단된다. 주별 검증을 수행했을 때 일부 지점을 제외한 검증에서는 RMSE가 2.5 km로 안정된 성능을 보였다. 위 결과를 종합하면 저 품질의 관측 자료 또는 Collocation 문 제를 해결한다면 모든 계절에 대하여 성능 좋은 시정 회귀 모델을 구축할 수 있을 것으로 사료 된다.

more

초록/요약 도움말

The Korean peninsula is frequently haze by high concentration aerosols in the sprin -g and winter. However, studies to estimate the visibility caused by aerosols are insuffi -cient. Therefore, in this study, a visibility regression model was constructed and validat -ed by applying weather and PM concentration data in the spring and winter to a rando -m forest. When building the visibility regression model, under-sampling was performed for training data. Also, to validate the superiority of the random forest regression model, a multiple linear regression model was constructed using the same data and comparative validation was performed. The visibility regression model confirmed that the accuracy for low visibility increased after under-sampling. Also, when comparing the two regress -ion models, the random forest model showed a better validation index than the multiple linear regression model, confirming the superiority of the random forest regression mod -el. In the station-by-station validation, the stability of the regression model was confir -med at most stations, but some stations showed poor validation index. As a result of analyzing the cause, the correlation between measurement visibility, relative humidity, and PM2.5 was very low at some stations. It seems that the performance of the regression model deteriorated due to the problem of collocation or low-quality measurem -ent data at some stations. When weekly validation was performed, RMSE showed a stable performance of 2.5 km in validation except for some stations. Summarizing the above results, it is considered that a good performance regression model can be built for all seasons by solving low-quality measurement data or collocation problems.

more

목차 도움말

차 례 ····························································································································································i
그림 차례 ···················································································································································ii
표 차례 ······················································································································································iii
국문 요약 ··················································································································································iv
제 1 장 서론 ············································································································································1
제 2 장 자료 및 연구 방법 ·················································································································· 3
2.1. 이론적 배경 ·································································································································· 3
2.2. 연구 자료 ······································································································································ 5
2.3. 연구 방법 ······································································································································ 9
2.3.1. 자료 전처리 ··························································································································· 9
2.3.2. 시정 회귀 모델 구축 ········································································································· 10
2.3.3. 시정 회귀 모델 검증 방법 ······························································································· 11
제 3 장 결과 ·········································································································································· 13
3.1. Under-sampling 결과 ·············································································································· 13
3.2. 변수별 상관계수 행렬 및 랜덤 포레스트 변수 중요도 ···················································· 14
3.3. 시정 회귀 모델 및 검증 결과 ································································································ 17
3.3.1. 다중 선형과 랜덤 포레스트 회귀 모델 ········································································· 17
3.3.2. 전체 자료 검증 ··················································································································· 17
3.3.3. 지점별 자료 검증 ··············································································································· 20
3.3.4. 주별 자료 검증 ··················································································································· 30
3.3.5. 최종 검증 ····························································································································· 31
제 4 장 요약 및 결론 ·························································································································· 33
참고 문헌 ················································································································································ 35
영문 요약 ················································································································································

more