4. NUCLEAR KNOWLEDGE MANAGEMENT AND SEMANTIC TECHNOLOGIES
4.3. Examples of ongoing and proposed applications
A knowledge discovery process includes data cleaning, data integration, data selection, transformation, data modelling, pattern evaluation and knowledge presentation. Knowledge discovery techniques perform data analysis and may uncover important data patterns, contributing greatly to knowledge bases and scientific research. This process needs to handle relational and diversified types of data to extract meaningful information. With suitable knowledge representation, real world knowledge can be used for problem solving and reasoning.
The application of hybrid methodologies of soft computing, including neural networks, fuzzy systems and evolutionary computing, provides the power to extract and express knowledge contained in data sets in multiple ways. It improves the performance in terms of accuracy and generalization capability while dealing with highly dimensional, complex regression and classification problems.
The knowledge discovery tools and models help one to discover and extract meaningful knowledge in the form of human interpretable relationships/patterns that are buried in the unstructured or structured document collection, and represent the knowledge components in appropriate ways to facilitate operations like storage, retrieval, inference and reasoning.
The essential aspect for nuclear organizations consists in the capability of these methods to extract new insights and knowledge from the huge amount of data continuously produced by all information sources. Correlating data sources that are currently treated as separate from each other can produce novel and unidentified information (e.g. in the analysis of events or in crisis management), as witnessed by the ever growing big data applications in many scientific and technical domains.
grow to very large dimensions. There, the features mentioned in the previous paragraphs on automated indexing, tagging and semantic search will reveal their full potential in improving search, retrieval and knowledge discovery.
Today’s portal software products and content management systems support semantic capabilities to some extent. While provisions are usually made for incorporating term management to support manual tagging, features such as auto-tagging, semantic search, inclusion of RDF parsers or linked data connectivity are not yet commonly available. However, third party tools are obtainable to fill these gaps.
As an example of such architecture, prototypes of knowledge portals are being developed in the IAEA that link the document libraries of Microsoft SharePoint with the extraction service of the tool used for managing KOSs (PoolParty Server with PowerTagging extension54). When a document is uploaded it is sent to the extraction service, which returns the extracted concepts to the library’s metadata fields.
A semantic search then considers these metadata with high priority, thereby improving the search result and ranking, and providing facets for refining the search based on the concepts in the metadata fields.
4.3.1.2. Wikis
Another way of making knowledge bases provide value to users is to publish them in the form of a wiki. As a guiding principle, a wiki page is assigned to each concept present in the knowledge base. Some items (e.g. the definition of a concept or link related concepts) may be (preferably automatically) derived from the underlying KOS; other page sections will be open to the authors to contribute more information, images and links related to the concept. The hierarchy of pages thus reflects the taxonomic structure of the knowledge base. This proceeding allows for high flexibility in providing content (a wiki is usually used in a collaborative way) and an easy means of maintaining a structure supporting organization of content, navigation and retrieval. Further semantic features may be included by enriching the text with semantic annotations and relations with extensions of the wiki functionality [49].
The wiki of the nuclear knowledge management (NKM) section is an example of utilizing a knowledge base (the NKM knowledge base) to build a wiki.55 So far, the definitions are automatically synchronized between the KOS management tool (by which the KOS is maintained) and the wiki. Future extensions will deal with synchronizing newly created and deleted pages between the management tool and the wiki.
4.3.1.3. Custom linked data applications
In cases when a knowledge portal or a wiki does not fulfil the needs of the application, a custom linked data application can be created. Several frameworks exist for the development of semantic applications, both open source and commercial. The frameworks offer the freedom to define the required functionality, appearance and user interaction just as any other Web application development platform; in addition, however, they provide an API that allows for accessing repositories compliant with linked data principles and issuing SPARQL queries against them. All data retrieved from (possibly federated) data sources may be processed by the application by standard means of Web programming and presented to the user through the graphic user interface.
4.3.2. Interlinking and comparing KOSs
By publishing KOSs as linked data, they can be straightforwardly compared and interconnected.
As KOSs on the same or similar topics are often developed independently, an overlap of the terms is unavoidable. However, the terms might be defined and described differently. In the nuclear domain, this is of particular importance in safety related, authoritative vocabularies, when divergent definitions of
54 See https://www.poolparty.biz/ and https://www.poolparty.biz/poolparty-powertagging/
55 See http://wiki-nkm.iaea.org
terms might lead to equivocal interpretation by different parties. This does not necessarily imply that the definitions have to be harmonized: often, the differences in definitions are well justified (e.g. when used in another context).
An initiative has started at the IAEA to collect such authoritative vocabularies, to search for identical or very similar terms and to issue their definitions and descriptions side by side. No attempt is made to harmonize the definitions. As mentioned, there may be good reasons for the diversity; in any case, the decision to modify a definition is left to the authors.
The envisaged availability in the near future of many KOSs covering different areas in the nuclear field opens up the prospect of interlinking these KOSs to produce knowledge models with very broad scopes. As they will be developed independently, these KOSs will undoubtedly overlap. This situation is a common one on the Web, where many KOSs are published on similar subjects. The W3C standards provide the means to deal with this situation. KOSs may be linked by declaring topics in different KOSs as ‘sameAs’, or assigning them relations such as ‘exact matching concepts’, ‘close matching concepts’,
‘broader/narrower matching concepts’ or generic ‘related matching concepts’. This allows for building large, extensible networks of KOSs, which by appropriate harvesting and query can be treated as a single knowledge model.
4.3.3. Competency networks
The potential of semantic technologies lies foremost in their ability to connect disparate information.
Thereby, new insight may be gained into systems that interoperate in complex ways. One example is found in competency networks, where different factors are involved: industries looking for qualified staff and advertising their positions by job description; competencies required for given tasks; taxonomies describing required skills and knowledge; training centres and academic institutions providing education and training to gain those skills and knowledge. Combining these aspects may answer questions on existing gaps between job requirements and education, forecast future needs for qualified staff and direct education resources in particular directions.
The means of achieving this connection between areas of interest are provided by semantic technologies. Each of these areas is already well documented in tables, lists and texts: in some instances also by means of KOSs. Correlating these information sources first requires the specification of a common vocabulary. Existing documentation would then be converted to an RDF; the combination with existing knowledge models can be achieved by mappings on the basis of the common vocabulary.
While the need for such systems aligning the requirements of organizations with respect to qualified staff with appropriate education and training is not unique to the nuclear field, it carries particular weight in this domain due to the safety implications present in every stage of design and operation, requiring specific consideration in all nuclear related activities.
4.3.4. Education and training networks
Within the scope of fostering education and training, the IAEA supports the creation and advance of networks such as ANENT (Asian Network for Education in Nuclear Technology), LANENT (Latin American Network for Education in Nuclear Technology), AFRA-NEST (African Network for Education in Science and Technology) and STAR-NET (Regional Network for Education and Training in Nuclear Technology).56 Their primary objectives are to assist the member countries in building capacity and to develop human and scientific infrastructure through cooperation in education, nuclear knowledge management and related research and training in nuclear technology. The system is highly distributed, offering academic courses, training material and support within the network from partners residing in several locations. To facilitate an overview of all these resources, an information system is needed that combines detailed descriptions of the resources and information on how to access them. For example,
56 See https://www.iaea.org/topics/nuclear-knowledge-management/nuclear-education-networks
materials should be related to context and subject by topical tags; links to related materials provided;
courses denoted by dates, locations, lecturers, required previous certificates and profiles of the people involved (lecturers, students); and more. While a ‘classical’ database might provide this information, the features realizable by a semantic approach such as a decentralized structure of the information system, the flexibility of changing and enhancing a given common scheme and the interconnection of related information by linked data principles offer significant advantages in providing a comprehensive, up to date overview of the network.
4.3.5. Knowledge and learning objects repositories (K/LORs)
Repositories of learning objects that can be remotely accessed, allowing for the reuse of learning material and its reorganization into new educational formats, are especially important in e-learning environments. Therefore, the IAEA has initiated a project to develop a prototype of a K/LOR intended for the dissemination and preservation of knowledge and open educational resources. The repository has been integrated with a KOS management tool, using the Web service offered by the KOS manager for concept extraction. Thereby, at upload of the document into the repository, the full text of the document is analysed with respect to the keywords present in a given taxonomy. The tagging process has also been extended to include multimedia material, which plays an important role in education, in particular videos of lectures. This is achieved by utilizing tools for transcribing spoken language to text, and then submitting the transcribed text to the Web service in order to obtain the tags in the same way as with textual objects.
This combination of integrated (via service consumption) platforms offers the feature of exposing the metadata in the RDF. Thereby, as demonstrated in the prototype, a SPARQL endpoint could be implemented offering queries not only over the whole of the repository material, but also other external sources of information. The development and implementation of the prototype is described in more detail as a use case in the Annex.
4.3.6. Extracting lessons learned from operational experience and events analysis
Feedback from operating experience (OPEX) is one of the key means of maintaining and improving the safety and reliability of NPP operations and other nuclear organizations. Diverse industries such as aeronautics, chemicals, pharmaceuticals and explosives all depend on OPEX feedback to provide lessons learned about safety that can assist in improving safety performance. An effective OPEX feedback programme helps in improving NPP design, equipment (including mechanical, electrical and instrumentation and control systems) requirements and characteristics, operating and maintenance procedures, and encourages greater proactivity in taking preventive measures to ensure NPPs operation in a safer and more efficient manner. OPEX is also important for improving the methods and tools for analysis, and thus increasing the value and validity of the findings for the analyst. However, effective analysis of incidents and events is knowledge and time intensive, and analysis performed by different parties/stakeholders is often not shared. Moreover, most of the OPEX minor event reports are produced in obsolete and heterogeneous forms. In many cases these reports are not distributed outside a facility (not even at the corporate level). Such practice limits the capabilities to extract useful knowledge using modern achievements in text analysis such as techniques for extracting and searching for concepts as well as finding associations between them. Taking advantage of the capability of semantic technologies to connect different terminologies, which link to ever growing data repositories published on the Web compliant with international Web standards, and to interconnect knowledge models on a multitude of subject matters will give new insights for lessons to be learned. This will also provide broad application scopes, with potentially beneficial impacts in new application areas (e.g. improving the design of systems and components). Semantic engines could provide an analyst working with root cause analysis of events at an NPP with additional correlations; for instance, between the number of events and results of personnel training, average age of the crew and so on. Such unexpected correlations could facilitate better root cause determination and development of the right corrective measures. This is especially valid for minor events
analysis since the number of such events is big enough to provide a reliable basis for semantic analysis.
Furthermore, the interoperability of minor events repositories will improve analysis and extraction of lessons learned.
4.3.7. Plant information models (PIMs)
In an NPP, multiple information systems and databases from different vendors and for different purposes are used. Most of these systems are not integrated with each other and cannot share plant data throughout their life cycle. This results in redundancies in capturing, handling, transferring, maintaining and preserving a plant’s data. Interoperability problems can stem from the fragmented nature of the industry, paper-based business practices, a lack of standardization and inconsistent technology adoption among stakeholders. Recent exponential growth in computer, network and wireless capabilities, coupled with more powerful software applications, has made it possible to apply information technologies in all phases of a facility’s life cycle, creating the potential for streamlining historically fragmented operations.
The focal point for consolidating all these diverse data management tasks consists in a plant information model that is comprehensive, detailed, and able to be integrated and interoperable with design requirements, plant design, operations and maintenance processes, as well as databases, document systems and records systems of organizations that own and operate them. These advanced technologies provide an opportunity to radically improve knowledge capture, integration and transfer between stakeholders if industry wide standards are developed and widely used. In consequence, new NPPs are being designed, procured and constructed using modern computer-aided engineering (CAE) and computer-aided design (CAD) systems with three, four and more dimensional modelling along with data, databases and electronic document sources.
Semantic technologies provide the glue to link all the information and documents. As an example, a component of a system, which itself is described within a taxonomy listing all the systems and subsystems in an NPP, may be interlinked with, for example, its design specifications, maintenance protocols, the history of maintenance, operation and failures, or documents pointing to safety relevance and safety analyses. These information sources usually exist in different formats and different repositories, or even reside in outside organizations (e.g. the manufacturer). The role of semantic technologies will be manifold: the taxonomy of systems and subsystems will be particularly useful if based on standards, enabling unequivocal references in the form of URIs, and other information from various systems such as plant data or documents (which by themselves will be identified with unique names) will be linked to it. In the process industry, ISO 15926, a standard for data integration, sharing, exchange and handover between computer systems, defines a reference data library (RDL) containing the terms used within process industry facilities in the form of an extensive ontology, reusable in many applications and for many purposes. The RDL covers many of the artefacts used in engineering, construction and operation of nuclear facilities; the extensions needed for nuclear specific terms are being discussed.
On top of a system based on semantic technology, sophisticated queries will be available to support the generation of reports. The development of applications based on such a PIM will be able to reference all items by their unique identifiers and frame them into a new context.
As yet, efforts in developing PIMs have been centred on utilizing a variety of data sources (‘data-centric’ PIM). The ultimate goal of the PIM reaches further and is best reflected in the definition of a PIM adopted by the IAEA: “A Knowledge-centric Plant Information Model is a semantically organized set of information describing plant structures, systems and components, incorporating relationships and rules within a knowledge framework that collectively form enriched representations of the plant that provide shared knowledge services and resources over its life cycle” [50]. The transition from a data-centric to a knowledge-centric approach will enable easier, faster, more accurate and sustainable NPP design and design knowledge information sharing, exchange and transfer across the NPP life cycle.
4.3.8. Crisis management, emergency preparedness and Semantic Web
The need to align terminology in the step of processing information during crisis management has been recognized as being of high relevance: “In crisis management, different domain vocabularies are used by different crisis information systems. This presents a challenge to exchanging information efficiently since the semantics of the data can be heterogeneous and not easily assimilated. For example, the word ‘Person’ can have different meanings — a ‘displaced person’, ‘recipient of aid’, or ‘victim’.
Semantic interoperability is a key challenge to interoperability” [51].
Several publications show that using an ontology helps in sharing and interoperating between several sources of information in crisis management. Still, the question has to be resolved as to which ontologies might be useful in emergencies, as there are no officially registered or recommended ontologies [52].
In the specific case of a nuclear emergency, a KOS would have to be developed by the stakeholders involved. Some additional ontologies in the fields of health care and pathology can then be linked to access additional important information.