Abstract:
The Health and Demographic Surveillance System (HDSS) is a data collection system
that can track crucial events such as births, deaths, and migrations in well-defined geographic
areas, particularly in low- and middle-income households. HDSS tracks the
life events of approximately three million people in 18 low- and middle-income African,
Asian, and Oceanian nations. Having HDSSs strategically located within a country can
provide a more complete picture of health-related and other social problems affecting
the public. The HDSS keeps tabs on vital demographic and health indicators as well
as other metrics to help shape national policies and programmes for departments of
basic education, home affairs, social development, and health. However, their establishment
was plagued by several difficulties, including the difficulty of obtaining high-quality
data because of the use of antiquated methods or systems. The cornerstone of a wellfunctioning
HDSS is high-quality, and timely health data, which is often lacking in lowand
middle-income countries. There is a paucity of high-quality, disaggregated data to
monitor health inequities and promote the equitable delivery of health services. HDSSs
are confronted with data quality-related problems due to how data is acquired and managed.
This study addresses these problems by building a data system that integrates
a novel framework known as the 3-Tier Total Data Quality Management Framework
(3TTDQMF). The framework manages the quality of data from the point of collection
through to the storage in the database. At the core of the framework, is an automated
data quality control methodology to autonomously validate and control the quality of
data. Open source technologies such as Pentaho data integration (PDI), R application
programming interface (R-API), Windows task scheduler, Bash and Python programming
languages were used to automate and quality control the data. The experiment
was set up in Hyper-converged IT infrastructure running the Windows 2016 server operating
system. The results have shown that the proposed approach greatly improved
the overall efficiency of the system and the quality of data. The efficiency in dealing
with data quality issues was ensured through the implementation of an automated system.
The research evaluated the system’s capacity to generate high-quality data using
measures such as data accuracy, completeness, consistency, timeliness, and validity. All
quality metrics exhibited an increasing trend, indicating that the proposed approach led
to a substantial improvement in data quality. The results further demonstrated that
the use of Pareto analysis and Process control techniques in data quality management
can greatly improve the quality of data by identifying and monitoring the causes of data
quality issues.