Data interoperability in context: the importance of open-source implementations when choosing open standards

Authors
Affiliations

Daniel Kapitan

Dutch Hospital Data

PharmAccess Foundation

Eindhoven University of Technology

Femke Heddema

PharmAccess Foundation

Andre Dekker

Department of Radiation Oncology (Maastro), GROW - Research Institute for Oncology and Reproduction, Maastricht University Medical Center+

Melle Sieswerda

Integral Cancer Registry Netherlands

Maastricht University

Bart-Jan Verhoeff

Expertisecentrum Zorgalgoritmen

Matt Berg

Ona

Abstract

In response to the proposal of Tsafnat et al. to converge towards three open health data standards, this viewpoint provides a critical reflection on the proposed alignment of using openEHR, FHIR and OMOP as the default standards for clinical care and administration, data exchange and longitudinal analysis, respectively. We argue that open standards are a necessary but not sufficient condition to achieve health data interoperability. The ecosystem of open-source implementations needs to be considered when choosing an appropriate standard for a given context. We discuss two specific contexts, namely standardization of i) health data for federated learning, and ii) health data sharing in low- and middle income countries (LMICs). Specific design principles, practical considerations and implementation choices for these two contexts are described, based on ongoing work in both areas. In the case of federated learning, we observe convergence towards OMOP and FHIR, where the two standards can effectively be used side-by-side given the availibility of mediators between the two. In the case of health information exchanges in LMICs, we see a strong convergence towards FHIR as the primary standard, with as yet limited adoption of OMOP and openEHR. We propose practical guidelines for context-specific adaptation of open standards.

Keywords

OMOP, openEHR, FHIR, secondary use, data platform, digital platform

Open standards are a necessary but not sufficient condition for interoperability

“A paradox of health care interoperability is the existence of a large number of standards with significant overlap among them,” say Tsafnat et al., followed by a call to action towards the health informatics community to put effort into establishing convergence and preventing collision [1]. To do so, they propose to converge on three open standards, namely i) openEHR for clinical care and administration; ii) Fast Health Interoperability Resources (FHIR) for data exchange and iii) Observational Medical Outcomes Partnership Common Data Model (OMOP) for longitudinal analysis. They argue that open data standards, backed by engaged communities, hold an advantage over proprietary ones and therefore should be chosen as the steppingstones towards achieving true interoperability.

While we support their high-level rationale and intention, we feel their proposed trichotomy does not do justice to details that are crucial in real-world implementations. This viewpoint provides a critical reflection on their proposed framework in three parts. First, we reflect on salient differences between the three open standards from the perspective of the notion of openness of digital platforms [2] and the paradox of open [3]. Subsequently, we outline the importance of the open-source ecosystem by reflecting on our considerations in designing and implementing health data platforms in two specific contexts, namely i) platforms for federated learning on shared health data in high income countries; and ii) health data platforms for low and middle income countries (LMICs). We conclude with practical guidelines for context-specific adaptation of open standards.

Digital platforms require extensibility, availibility of complementary components and availibility of executable pieces of software

In their editorial, Tsafnat et al. argue that i) the paradox of interoperability of having overlapping standards can be addressed by converging on just three standards; ii) practical and socio-technical considerations are as important as, if not more important than, technical superiority and therefore balancing of customizibility and rigidity is of the essence; and iii) open standards, backed by engaged communities, hold an advantage over proprietary ones. While we concur with these points, we argue that these are necessary, but not sufficient conditions for convergence of health data standards. Existing research on digital platforms underlines the importance of the platform openness, not only in terms of open standards, but also in terms of availibility of executable pieces of software, extensibility of the code base and availibility of complements to the core technical platform (in this case the health data standard is the core technical platform) [2]. Only when the majority of these aspects of digital platforms are met can we resonably expect that the digital platform will indeed flourish and be longlived.

A similar line of reasoning has been put forward by Keller and Tarkowski in what they call the paradox of open, namely that open ecosystems can only flourish if two types of conditions are met [3]. The first condition states that many people need to contribute to the creation of a common resource. “This is the story of Wikipedia, OpenStreetMap, Blender.org, and the countless free software projects that provide much of the internet’s infrastructure.” [3] Indeed, Tsafnat et al. have explicitly taken into account that “an engaged and vibrant community is a major advantage for the longevity of the data standards it uses,” which has informed their proposal to converge towards OMOP, FHIR en openEHR. However, the emphasis on open-source implementations is somewhat overlooked. This point is only mentioned in passing when Tsafnat et al. reference work done by Reynolds and Wyatt who already argued in 2011 “… for the superiority of open-source licensing to promote safer, more effective health care information systems. We claim that open-source licensing in health care information systems is essential to rational procurement strategy” [4]. Hence, we extend the line of reasoning of Tsafnat et al. by emphasizing that the availability of executable pieces of software, extensibility of the code base and availibility of complementary components is an important criterion which needs to be explicitly taken into account when choosing which standard to adopt.

The second condition put forward by Keller and Tarkoswki is that open ecosystems have proven fruitful when “opening up” is the result of external incentives or requirements, rather than voluntary actions. Examples of such external incentives are “… publicly-funded knowledge production like Open Access academic publications, cultural heritage collections in the Public Domain, Open Educational Resources (OER), and Open Government data.” [3] Another canonical example is the birth of the GSM standard, which was mandated by European legislation [5]. Reflecting on this condition in the context of open health data ecosystems, we observe a salient difference between FHIR vis-a-vis openEHR and OMOP, namely that the former is the only one that has been mandated (or at least strongly recommended) in some jurisdictions. In the US, the Office of the National Coordinator for Health Information Technology (ONC) and the Centers for Medicare and Medicaid Services (CMS) have introduced a steady stream of new regulations, criteria, and deadlines in Health IT that has resulted in significant adoption of FHIR [6]. In India, the open Health Claims Exchange protocol specification - which is based on FHIR - has been mandated by the Indian government as the standard for e-claims handling [7,8]. The African Union recommends all new implementations and digital health system improvements use FHIR as the primary mechanism for data exchange [9], but doesn’t say anything about the use of, for example, openEHR for administrative point-of-service systems. The upcoming legislation on the European Health Data Space (EHDS) mandates interoperability between electronic health record systems but has not specified which standard is to be used, although FHIR and openEHR have both been mentioned in the legislative discussion.

These external incentives have resulted in a large boost in both commercial and open-source development activities in the FHIR ecosystem. Illustrative of this is the speed with which the Bulk FHIR API has been defined and implemented in almost all major implementations [10,11], and the the SQL-on-FHIR specification to make large-scale analysis of FHIR data accessible to a larger audience and portable between systems [12]. It has also led to more people voluntarily contributing to FHIR-related open-source projects, which has resulted in a wide offering of FHIR components across major technology stacks (Java, Python, .NET), thereby strengthening the first condition. By comparison, OMOP and openEHR have not yet profited from external incentives to spur the adoption and thereby growing the ecosystem beyond a certain critical mass. To illustrate this, a search on GitHub on “FHIR” yields 8.2 thousand results, “OMOP or OHDSI” one thousand results, and “openEHR” returns 400 results. A quick-scan of the available open-source components listed on the website of the three governing bodies HL7 [13], OHDSI [14] and openEHR [15], indicates that the ecosystem of FHIR and OMOP have a significantly larger offering of extensible and complementary open-source components than openEHR, although for the latter notable mature open-source implementation are also emerging such as EHRbase [16].

Hence, we stress that beyond evaluating the instrinic structure of an open standard and the community that supports the standard, we need to take into account the wider ecosystem of open-source implementations and availibility of complementary components. From this wider perspective of the whole ecosystem surrounding the three standards, FHIR stands out as having the most diverse and rich ecosystem because it has been mandated in certain jurisdictions. This is relevant when comparing these standards in real-world implementations. We now turn to two specific use cases where these considerations are at play.

Standardization of health data for federated learning

The current fragmentation in health data is one of the major barriers towards leveraging the potential medical data for machine learning (ML). Without access to sufficient data, ML will be limited in its application to health improvement efforts and, ultimately, from making the transition from research to clinical practice. High quality health data, obtained from a research setting or a real-world clinical practice setting, is hard to obtain, because health data is highly sensitive and its usage is tightly regulated.

Federated learning (FL) is a learning paradigm that aims to address these issues of data governance and privacy by training algorithms collaboratively without moving (copying) the data itself [17,18]. Based on ongoing work with the PLUGIN healthcare consortium [19], we have detailed an architecture for FL for secondary use of health data for hospitals in the Netherlands. Starting point for this implementation are the National Health Data Infrastructure agreements for research, policy and innovation for the Dutch healthcare sector, which have been adopted at the beginning of 2024 [20]. Figure 1 shows a high level reference architecture of the infrastructure to be, comprising three areas (multiple use, applications and generic features) and a total of 26 functional components (for details please refer to [20]). One of the prerequisites of this architecture is that organizations that participate in a federation of ‘data stations’ use the same common data model to make the data Findable, Accessible, Interoperable and Resusable (FAIR). These FAIR data stations comprise components 7, 8 and 9 in Figure 1, i.e. the data, metadata and APIs, respectively, through which this the data station can be accessed and used.

Figure 1: Reference architecture for the Dutch health data infrastructure for research and innovation [20]

Following the line of reasoning of Tsafnat et al., OMOP would be the go-to standard for storing the longitudinal data in each of the data stations, where data is transformed from the original source (component 6), stored in common data model (component 7) and properly annotated with metadata (component 8). Indeed, by now there are quite a few reports of real-world implementations of federated learning networks based on the OHDSI-OMOP stack, including a global infrastructure with 22 centres for COVID19 prediction models [21], FeederNet in South Korea with 57 participating hospitals [22], Dutch multi-cohort dementia research with 9 centres [23], the European severe heterogeneous asthma research collaboration [24] and the recently initiated Belgian Federated Health Innovation Network (FHIN) [25].

For the PLUGIN project, however, we choose to adopt FHIR because the data model is more compatible with the data model of the clinical administration systems. As PLUGIN focuses on secondary use of routine health data, we feel it is more suitable than OMOP, the latter being more suitable for clinical research data. openEHR might have been an option, too, if more implementations and complementary components had been available. Another reason for choosing FHIR is its practicality and extensibility to be used in a Python-based data science stack, provenance of RESTful APIs out-of-the-box to facilitate easy integration with the container-based vantage6 FL framework, and the support of many healthcare terminologies and flexibility through the profiling mechanims [2628]. Increasingly, other projects have reported the use of FHIR for persistent, longitudinal storage for FL. The CODA platform, which aims to implement a FL infrastructure in Canada similar to the PLUGIN project, compared OMOP and FHIR and chose the latter as it has been found to support more granular mappings required for analytics [29]. The fair4health project used FHIR as part of a FAIRification workflow to simplify the process of data extraction and preparation for clinical study analyses [30].

Given that OMOP can be conceptually viewed as a strict subset of FHIR, hybrid solutions using OMOP and FHIR combined have also been reported, such as the German KETOS platform [31], and the preliminary findings from the European GenoMed4All project which aims to connect clinical and -omics data [32]. A collaboration of 10 university hospitals in Germany have shown that standardized ETL-processing from FHIR into OMOP can achieve 99% conformance [33], which confirms the feasiblity of the solution pattern where FHIR acts as an intermediate sharing standard through which data from (legacy) systems are extracted and made available for reuse in a common data model. One could argue that the distiction between FHIR amd OMOP becomes less relevant if data can be effectively stored in either standard. We are hopeful that initiatives like OMOP-on-FHIR indeed will foster convergence rather than collision between these two standards [34].

In the case of PLUGIN, another important consideration for choosing FHIR over OMOP is, that from a data architecture perspective, the mechanism of FHIR Profiles can be tied to principle of late binding commonly applied in data lake/warehouse architectures (Figure 2): allow ingest of widely different sources, and gradually add more constraints and validations as you move closer to a specific use case. If machine learning is the primary objective for secondary use, we want to be able to cast a wider net of relevant data, rather than being too restrictive when ingesting the data at the start of processing pipeline. Late binding in data warehousing is a design philosophy where data transformation and schema enforcement are deferred as late as possible in the data processing pipeline, sometimes even until query time. This approach contrasts with early binding, where data is transformed and structured as it is ingested into the data warehouse. The advantages of this design is that it allows for greater flexibility. During the initial ingestion of the data, we only require the data to conform to the minimal syntactic standard defined by the base FHIR version (R4 in the diagram). As the data is processed, more strict checks and constraints are applied, whereby ultimately different profiles can co-exists next to one another (the two most inner circles), within a larger circle with fewer strictions. Note that if any of the profiles includes a FHIR extension, such as adding a field to include a female’s maiden name, the profiles are no longer strictly concentric. Hence extra care needs to be taken when dealing with extensions when applying the principle of late binding.

Figure 2: Principle of late binding with FHIR profiling mechanism, illustrated with FHIR Profiles that are currently in use in the Netherlands.

We found that this principle of late binding also allows flexible and efficient implementations of the data stations that make use of the current best practices of a lakehouse architecture of [3537] and the composable data stack [38]. Lakehouses typically have a zonal architecture that follow the Extract-Load-Transform pattern (ELT) where data is ingested from the source systems in bulk (E), delivered to storage with aligned schemas (L) and transformed into a format ready for analysis (T) [35]. The discerning characteristic of the lakehouse architecture is its foundation on low-cost and directly-accessible storage that also provides traditional database management and performance features such as ACID transactions, data versioning, auditing, indexing, caching, and query optimization [39]. Lakehouses thus combine the key benefits of data lakes and data warehouses: low-cost storage in an open format accessible by a variety of systems from the former, and powerful management and optimization features from the latter. By explicitly aligining the mechanism of FHIR Profiles with this design pattern of a data lakehouse enables us to use complementary standards and open-source components, most notably Apache Arrow as the standard columnar in-memory format with RPC-based data movement [40]; Apache Parquet as the standard columnar on-disk format [41]; and Apache Iceberg as the open table format [42,43].

One of the key challenges in using FHIR in this way pertains to the need for upgrading the whole ELT pipeline when upgrading to a new primary FHIR version, for example R6. The potential technical debt of version upgrades in the future is not specific to FHIR, but being a younger standard changes are more frequent compared to OMOP and openEHR. However, we expect that the development time required to upgrade FHIR versions is significantly less than the initial migration to FHIR.

The above considerations also show the conceptual difference of FHIR as a health data exchange standard versus openEHR as a persistent storage of routine healthcare data and OMOP as a persistent storage of health research data. For health data exchange and federated learning, the recipient of the data determines to a large extent what subset of data available in the source needs to be made available – i.e. the target data model is known late and this favors late binding. In a persistent storage setting, the holder of the source data determines what data needs to be stored – and typically everything – which favors early binding.

Health data standards in LMICs

It is a widely held belief that digital technologies have an important role to play in strengthening health systems in LMICs. Yet, also here the current fragmentation of health data stands in the way of scaling up digital health programmes beyond project-centric, vertical solutions into sustainable health information exchanges [44]. In the context of global digital health developments, Mehl et al. have also called for convergence to open standards, similar to Tsafnat et al., but additionally stress the need for open-source technologies (also our main argument in this paper), open content (representations of public health, health system or clinical knowledge to guide implementations) and open architectures (reusable enterprise architecture patterns for health systems) [45]. As for the open architecture, we see a convergence towards the OpenHIE framework [46], which has been adopted by many sub-Saharan African countries as the architectural blueprint for implementing nation-wide health information exchanges (HIE) [47], including Nigeria [48], Kenya [49] and Tanzania [50]. Figure 3 shows an overview of the OpenHIE architecture.

Figure 3: OpenHIE architecture showing the Point of Service systems (black), the Interoperability Layer (green) and the Component Layer (blue).

While the OpenHIE specification is agnostic to which data standard should be used, in practice the digital health community in LMICs have de facto converged towards FHIR as the primary standard for health information exchange, in line with the proposal by Tsafnat et al. To illustrate this point, consider the OpenHIM Platform architecture (Figure 4), which is currently the largest open-source implementation of the OpenHIE specification. Clients (Point-of-Service systems) can initiate various workflows to submit or query patient data. The Shared Health Record (SHR) acts as the core transactional system for the health information exchange, which in this case is realized with the HAPI FHIR server, being one of the most widely used open-source FHIR server implementations [51].

Figure 4: OpenHIM Platform Architecture, illustrating the use of FHIR-based workflows between the components as specified in OpenHIE. CR: Client Registry. IOL: Interoperability Layer. MPI: Master Patient Index. SHR: Shared Health Record. Image taken from https://jembi.gitbook.io/.

Looking at the Point-of-Service systems, we see that as of today openEHR is rarely used as the standard for clinical administration in LMICs. The largest open-source EHR implementations are based on proprietary data models, and it is unlikely this will change any time soon [52]. Instead, we see that FHIR-native software development frameworks such as OpenSRP [53] and the Open Health Stack [54] are being used more and more. In this approach, health professionals use Android apps to register and collect routine health data (Figure 5). As an example, OpenSRP has been deployed in 14 countries targeting various patient populations, amongst which a reference implementation of the WHO antenatal and neonatal care guidelines for midwives in Lombok, Indonesia [55,56]. This solution design is particularly useful for mid-size and smaller healthcare facilities, which are often resource constrained, lacking basic IT infrastructure to deploy a full-blown electronic medical record system. Hence, by necessity, the FHIR-based SHR functions as the administrative system-of-record and as the hub for information exchange at the same time.

Figure 5: Overview of OpenSRP2 open-source framework for building clinical administration apps. HIS: health information systems. Image source: https://docs.opensrp.io/.

Finally, regarding longitudinal data analysis, we also see a convergence towards FHIR as the primary standard in LMICs. As in the case of federated learning, the choice for FHIR to implement datawarehouse and analytic platforms is the preferred method due to the widespread availibility of complementary open-source technologies. FHIR-specific technologies such as Bulk FHIR data access and SQL-on-FHIR mentioned earlier, allow the FHIR ecosystem to be used, complemented and integrated with generic open-source datawarehousing technologies such as Clickhouse [57] and dbt [58].

All in all, we see that in the context of LMICs, the standardization of the three domains put forward by Tsafnat merge into one. The SHR, as the key component within the OpenHIE specification, serves as the back-end of the system-of-record and provides a transactional, persistent storage engine for information exchange. Downstream longitudinal data stores continue to use FHIR as the common data model for analytical purposes. One could argue that it is in fact advantageous to converge to just one standard, thereby reducing complexity and cost of the total system. Such a perspective ties in with the hourglass model of layered systems architecture which has been used in the design of the Internet and Unix and has enabled viral adoption and deployment scalibility [59,60]. The hourglass mode is “… an approach to design that seeks to support a great diversity of applications (at the top of the hourglass) and allow implementation using a great diversity of supporting services (at the bottom).” [60] The center of the hourglass - the waist or also called the spanning layer in the information systems parlance - is defined by a set of minimal standards which mediates all interactions between the higher and lower layers. In case of the Internet, the spanning layer is defined by the TCP/IP protocol, which is supported by a variety of underlying connectivity services (many different physical networks) on top of which many different applications can be built (email, videoconferencing etc.).

Within the context of LMICs, we believe that FHIR can act as the spanning layer within open health data system at large. Because FHIR is inherently designed to make optimal use of internet standards, such as the json file format and REST APIs, it is very modular and developer friendly. The many components that make up the FHIR allows the standard to be used effectively to implement subsystems, such as a facility registry or a health worker registry. In comparison, OMOP and openEHR are less modular in their design and are thereby less suitable as a standard to implement the subsystems defined, for example, in the OpenHIE specification.

Conclusion

We agree with Tsafnat et al. that there is a dire need to converge to open data standards in healthcare, and support the proposal to focus on openEHR, FHIR and OMOP in healthcare informatics going forward. However, open standards are a necessary but not sufficient condition for the convergence of health data standardization. The availability of open-source implementations and complementary technologies are as important when choosing which open standard to use. Furthermore, we find that the proposed trichotomy is not always relevant and think that the full-STAC approach described by Mehl et al. is more comprehensive. In the case of FL, we see a convergence towards OMOP and FHIR, which can be used interchangeably. In the case of LMICs, we think that FHIR as the potential of acting as the spanning layer within the open health data system at large, thereby enabling much wider standardization and adoption. We strongly support ongoing developments to increase the availibility of open-source implementations as digital public goods [61] and integration projects such as Instant OpenHIE [62], through which we have a fighting chance to move the needle in health data standardization for LMICs.

Authors’ Contributions

DK contributed to the concept and design of the manuscript and prepared the first draft. AD, MS and BJV contributed to the section on federated learning. FH and MB contributed to the section on LMICs. All authors contributed to the final revision and approved the final manuscript.

Conflicts of interests

DK received funding from PharmAccess as a contractor to conduct the work on LMICs reported here. MB/ONA is the core developer of the open-source OpenSRP 2 framework.

References

1.
Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or Collide? Making Sense of a Plethora of Open Data Standards in Health Care. Journal of Medical Internet Research. 2024;26(1):e55779. doi:10.2196/55779
2.
de Reuver M, Sørensen C, Basole RC. The Digital Platform: A Research Agenda. Journal of Information Technology. 2018;33(2):124-135. doi:10.1057/s41265-016-0033-3
3.
Keller P, Tarkowski A. The Paradox of Open. Open Future. Published online March 5, 2021. Accessed March 25, 2024. https://openfuture.pubpub.org/pub/paradox-of-open/release/1
4.
Reynolds CJ, Wyatt JC. Open Source, Open Standards, and Health Care Information Systems. Journal of Medical Internet Research. 2011;13(1):e1521. doi:10.2196/jmir.1521
5.
GSM. In: Wikipedia.; 2024. Accessed September 20, 2024. https://en.wikipedia.org/w/index.php?title=GSM&oldid=1245675274
6.
Firely. FHIR in US Healthcare Regulations.; 2023. Accessed May 30, 2024. https://simplifier.net/organization/firely/news/153
7.
National Digital Health Mission. India National Health Authority; 2020.
8.
HCX Protocol V0.9.; 2023. Accessed September 18, 2024. http://hcxprotocol.io/
9.
Tilahun B, Mamuye A, Yilma T, Shehata Y. African Union Health Information Exchange Guidelines and Standards.; 2023.
10.
Mandl KD, Gottlieb D, Mandel JC, et al. Push Button Population Health: The SMART/HL7 FHIR Bulk Data Access Application Programming Interface. npj Digital Medicine. 2020;3(1):1-9. doi:10.1038/s41746-020-00358-4
11.
Jones J, Gottlieb D, Mandel JC, et al. A landscape survey of planned SMART/HL7 bulk FHIR data access API implementations and tools. Journal of the American Medical Informatics Association. 2021;28(6):1284-1287. doi:10.1093/jamia/ocab028
12.
SQL on FHIR V2.0.0-Pre. Accessed September 20, 2024. https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/
13.
FHIR Open Source Implementations. September 20, 2024. Accessed September 20, 2024. https://confluence.hl7.org/display/FHIR/Open+Source+Implementations
14.
Software ToolsOHDSI. Accessed September 20, 2024. https://www.ohdsi.org/software-tools/
15.
Beale SH Thomas. openEHR Platform. Accessed September 20, 2024. https://openehr.org/products_tools/platform/
16.
EHRbase 2.0 website. Published online March 19, 2024. Accessed September 20, 2024. https://www.ehrbase.org/
17.
Rieke N, Hancox J, Li W, et al. The future of digital health with federated learning. npj Digit Med. 2020;3(1, 1):1-7. doi:10.1038/s41746-020-00323-1
18.
Teo ZL, Jin L, Liu N, et al. Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture. Cell Reports Medicine. 2024;5(2):101419. doi:10.1016/j.xcrm.2024.101419
19.
PLUGINPlatform voor Uitwisseling en Hergebruik van Klinische Data Nederland. Accessed September 21, 2024. https://plugin.healthcare/
20.
Health-RI. Agreements on the National Health Data Infrastructure for Research, Policy and Innovation - Health-RI Nationale Gezondheidsdata-infrastructuur - Confluence. January 29, 2024. Accessed June 3, 2024. https://health-ri.atlassian.net/wiki/spaces/HNG/pages/249073646/Agreements+on+the+National+Health+Data+Infrastructure+for+Research+Policy+and+Innovation
21.
Khalid S, Yang C, Blacketer C, et al. A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data. Computer Methods and Programs in Biomedicine. 2021;211:106394. doi:10.1016/j.cmpb.2021.106394
22.
Lee S, Kim C, Chang J, Park RW. FeederNet (Federated E-Health Big Data for Evidence Renovation Network) platform in KoreaOHDSI. 2022. Accessed June 4, 2024. https://www.ohdsi.org/2022showcase-33/
23.
Mateus P, Moonen J, Beran M, et al. Data harmonization and federated learning for multi-cohort dementia research using the OMOP common data model: A Netherlands consortium of dementia cohorts case study. Journal of Biomedical Informatics. 2024;155:104661. doi:10.1016/j.jbi.2024.104661
24.
Kroes JA, Bansal AT, Berret E, et al. Blueprint for harmonising unstandardised disease registries to allow federated data analysis: Prepare for the future. ERJ Open Research. 2022;8(4). doi:10.1183/23120541.00168-2022
25.
Deltomme C, Denturck K, De Jaeger P, et al. Federated Health Innovation Network (FHIN). Published online September 20, 2024. https://www.ohdsi-europe.org/images/symposium-2024/Posters/poster%20OHDSI%20FHIN%20Camille%20Deltomme%20-%20Camille%20Deltomme.pdf
26.
Moncada-Torres A, Martin F, Sieswerda M, Van Soest J, Geleijnse G. VANTAGE6: An open source priVAcy preserviNg federaTed leArninG infrastructurE for Secure Insight eXchange. AMIA Annu Symp Proc. 2021;2020:870-877. Accessed September 21, 2024. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075508/
27.
Choudhury A, van Soest J, Nayak S, Dekker A. Personal Health Train on FHIR: A Privacy Preserving Federated Approach for Analyzing FAIR Data in Healthcare. In: Bhattacharjee A, Borgohain SKr, Soni B, Verma G, Gao XZ, eds. Machine Learning, Image Processing, Network Security and Data Sciences. Communications in Computer and Information Science. Springer; 2020:85-95. doi:10.1007/978-981-15-6315-7_7
28.
Smits D, Van Beusekom B, Martin F, Veen L, Geleijnse G, Moncada-Torres A. An Improved Infrastructure for Privacy-Preserving Analysis of Patient Data. In: Mantas J, Gallos P, Zoulias E, et al., eds. Studies in Health Technology and Informatics. IOS Press; 2022. doi:10.3233/SHTI220682
29.
Mullie L, Afilalo J, Archambault P, et al. CODA: An open-source platform for federated analysis and machine learning on distributed healthcare data. Journal of the American Medical Informatics Association. Published online December 21, 2023:ocad235. doi:10.1093/jamia/ocad235
30.
Sinaci AA, Gencturk M, Alvarez-Romero C, et al. Privacy-preserving federated machine learning on FAIR health data: A real-world application. Computational and Structural Biotechnology Journal. 2024;24:136-145. doi:10.1016/j.csbj.2024.02.014
31.
Gruendner J, Schwachhofer T, Sippl P, et al. KETOS: Clinical decision support and machine learning as a service – A training and deployment platform based on Docker, OMOP-CDM, and FHIR Web Services. PLOS ONE. 2019;14(10):e0223010. doi:10.1371/journal.pone.0223010
32.
Cremonesi F, Planat V, Kalokyri V, et al. The need for multimodal health data modeling: A practical approach for a federated-learning healthcare platform. Journal of Biomedical Informatics. 2023;141:104338. doi:10.1016/j.jbi.2023.104338
33.
Peng Y, Henke E, Reinecke I, Zoch M, Sedlmayr M, Bathelt F. An ETL-process design for data harmonization to participate in international research with German real-world data based on FHIR and OMOP CDM. International Journal of Medical Informatics. 2023;169:104925. doi:10.1016/j.ijmedinf.2022.104925
34.
OMOPonFHIR. Accessed September 20, 2024. https://omoponfhir.org/
35.
Hai R, Koutras C, Quix C, Jarke M. Data Lakes: A Survey of Functions and Systems. IEEE Transactions on Knowledge and Data Engineering. 2023;35(12):12571-12590. doi:10.1109/TKDE.2023.3270101
36.
Harby AA, Zulkernine F. From Data Warehouse to Lakehouse: A Comparative Review. In: 2022 IEEE International Conference on Big Data (Big Data). IEEE; 2022:389-395. doi:10.1109/BigData55660.2022.10020719
37.
Harby AA, Zulkernine F. Data Lakehouse: A Survey and Experimental Study. doi:10.2139/ssrn.4765588
38.
Pedreira P, Erling O, Karanasos K, et al. The Composable Data Management System Manifesto. Proc VLDB Endow. 2023;16(10):2679-2685. doi:10.14778/3603581.3603604
39.
Armbrust M, Ghodsi A, Xin R, Zaharia M. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. In:; 2021:8.
40.
Apache Arrow.; 2024. Accessed September 20, 2024. https://arrow.apache.org/
41.
Apache Parquet.; 2024. Accessed September 20, 2024. https://parquet.apache.org/
42.
Jain P, Kraft P, Power C, Das T, Stoica I, Zaharia M. Analyzing and Comparing Lakehouse Storage Systems. Published online 2023.
43.
Apache Iceberg. Accessed September 20, 2024. https://iceberg.apache.org/
44.
Karamagi HC, Muneene D, Droti B, et al. eHealth or e-Chaos: The use of Digital Health Interventions for Health Systems Strengthening in sub-Saharan Africa over the last 10 years: A scoping review. J Glob Health. 2022;12:04090. doi:10.7189/jogh.12.04090
45.
Mehl GL, Seneviratne MG, Berg ML, et al. A full-STAC remedy for global digital health transformation: Open standards, technologies, architectures and content. Oxford Open Digital Health. 2023;1:oqad018. doi:10.1093/oodh/oqad018
46.
OpenHIE Framework V5.2-En.; 2024. Accessed August 27, 2024. https://ohie.org/
47.
Mamuye AL, Yilma TM, Abdulwahab A, et al. Health information exchange policy and standards for digital health systems in africa: A systematic review. PLOS Digital Health. 2022;1(10):e0000118. doi:10.1371/journal.pdig.0000118
48.
Dalhatu I, Aniekwe C, Bashorun A, et al. From Paper Files to Web-Based Application for Data-Driven Monitoring of HIV Programs: Nigeria’s Journey to a National Data Repository for Decision-Making and Patient Care. Methods Inf Med. 2023;62(03/04):130-139. doi:10.1055/s-0043-1768711
49.
Thaiya MS, Julia K, Joram M, Benard M, Nambiro DA. Adoption of ICT to Enhance Access to Healthcare in Kenya. IOSR-JCE. 2021;23(2):45-50.
50.
Nsaghurwe A, Dwivedi V, Ndesanjo W, et al. One country’s journey to interoperability: Tanzania’s experience developing and implementing a national health information exchange. BMC Medical Informatics and Decision Making. 2021;21(1):139. doi:10.1186/s12911-021-01499-6
51.
HAPI FHIR - The Open Source FHIR API for Java. Accessed September 20, 2024. https://hapifhir.io/
52.
Syzdykova A, Malta A, Zolfo M, Diro E, Oliveira JL. Open-Source Electronic Health Record Systems for Low-Resource Settings: Systematic Review. JMIR Medical Informatics. 2017;5(4):e44. doi:10.2196/medinform.8131
53.
Mehl G. Open Smart Register Platform (OpenSRP). OpenSRP. 2020;5:42-43. Accessed January 21, 2023. https://lib.digitalsquare.io/handle/123456789/77592
54.
Open Health Stack. Accessed September 20, 2024. https://developers.google.com/open-health-stack
55.
Development SI for. BUNDA App. May 9, 2023. Accessed January 18, 2024. https://www.sid-indonesia.org/post/bunda-app
56.
Kurniawan K, FitriaSyah I, Jayakusuma AR, et al. Midwife service coverage, quality of work, and client health improved after deployment of an OpenSRP-driven client management application in Indonesia. In: Atlantis Press; 2019:155-162. doi:10.2991/ichs-18.2019.21
57.
ClickHouse. Clickhouse: Fast Open-Source OLAP DBMS. Accessed September 20, 2024. https://clickhouse.com
58.
Dbt. Accessed September 20, 2024. https://www.getdbt.com/index
59.
Estrin D, Sim I. Health care delivery. Open mHealth architecture: An engine for health care innovation. Science. 2010;330(6005):759-760. doi:10.1126/science.1196187
60.
Beck M. On the hourglass model. Communications of the ACM. 2019;62(7):48-57. doi:10.1145/3274770
61.
Digital Public Goods Alliance. 2024. Accessed February 5, 2024. https://digitalpublicgoods.net/
62.
Instant OpenHIE V2. Published online July 3, 2024. Accessed September 20, 2024. https://jembi.gitbook.io/instant-v2/