Our UNI Passau team is specialized in Natural Language Processing and Computational Linguistics. Writing our first contribution for this blog is a challenge for us. First of all, we expect that we have to present the context for how NLP technologies are relevant to the area we address in the project – in whatever way someone understands this ‘abstraction’. People usually tend to relate the cybersecurity scene to Big Data and foster expectations for something even more exciting and forward looking.
Our lives were always subject to transformation – same also our perceptions for safety and security (two different things, though). In Istanbul (then Constantinople) there was one door called Kerkoporta that was left open by accident, allowing the first fifty or so Ottoman troops to enter the city.
There is a book that many technologists possibly ignore: ‘Architecture without architects: A short introduction to non-pedigreed architecture’. It has been originally published in 1964, as catalogue of an exhibition in MoMA, the Museum of Modern Art in Manhattan in New York, that has been curated by Bernard Rudofsky, a Czech American intellectual (Wikipedia refers to him as ‘writer, architect, collector, teacher, designer, and social historian’) in times where this was a title that one would have earned only if they deserved it. Rudofsky organized amongst others a series of controversial MOMA exhibits in the 1940s, 1950s and 1960s – one of his lectures was on ‘How Can People Expect to Have Good Architecture When They Wear Such Clothes?’.
So how can we expect to have cybersecurity in our digital lives, in our administrations (local or public) and in our business interactions (B2B or B2C) in the way that are currently organized? Or, to reverse the question and make it less rhetoric (and more practical) than it already is: how can we transform our digital footprints to make them safer, while keeping their human face and where possibly also improve it?
Let’s imagine a scenario of the future: Otto Normalverbraucher, this imaginary friend of John Smith in UK and Mario Rossi in Italy, has two options:
- stay his entire life at home, locked in his golden cage that is full of technologies (all what we mentioned above) to protect him from anything unwanted and anyone unpleasant and unplanned – but such a life is miserable at the end, isn’t it? Or…
- enrich his daily routine with support from a cybersecurity technology that will follow him from his young years till his old age.
The second is a better scenario that makes life worth(ier) to live. So let’s stay with this and elaborate on some ideas.
Our project deals with the development of a cybersecurity situational awareness and information sharing solution for use by local public administrations and makes use of advanced big data analytics. As one may see, the area is vast: it could involve any type of research items, including – as one might have suspected and as expected also Big Data. However, from our side, at the Lehrstuhl of Digital Libraries and Web Information Systems in the University of Passau, we saw the opportunity to introduce what we think may become a game changer: our contributions in the challenges that such a project brings with it relate mainly to the provision of a natural language interface and a corresponding semantic knowledge graph that can support interactions using natural language.
One may see that this approach is iconoclastic or unexpected – on the other hand, one can see that it follows natural patterns and – what we (: scientists and researchers) many times tend to forget, namely: the basic needs of people as drivers and motivations for our research.
Security is not in the epicentre of our Lehrstuhl; all technologies we develop and deploy base on computational linguistics and natural language processing tools for, amongst others, sentiment analysis and mining. Such applications can be found in almost any area of information technologies and business. And it is for this ‘ubiquity’ aspect they exhibit, that we regard them as very relevant for use in the CS-AWARE context.
One may now ask: and what does natural language have to do with this?
Over the last 20? 30? years, Language Technology has evolved in a way that resources and processing tools are now clearly separated. In the past, rule-based systems contained very special lexica that had little to do wth human oriented lexica and grammar rules that are completely ‘unnatural’ in terms of everyday human use. Today’s corpora and lexica are very close to human oriented resources while the tools that process them are language-independent to a considerable degree. What does this mean? In simple words: to build a system for use in Hong Kong is easily customizable to address language needs for offering the same system or a hybrid customization in Munich or in Barcelona or in Edinburgh. Language may still be a barrier for humans, but for the machines we are very close to overcoming this successfully. Whoever doubts this assumption may only need to have a look and test Google Translate – to name only a well-known example – in the future there will be many which will outperform Google Translate as much as if we dared to compare a high end Mercedes car model of today with one of the first models of the same company about one century ago.
‘Semantic Web’ that has been regarded as the holy grail for improving the underlying technology of many reasoning systems such as in the areas of security or smart cities, is not a panacea for semantic problems since it is based on language and we can afford to assume that technology today can work on multimodal semantics. However, for the purposes of human communication, language remains the most important means and in this sense the Semantic Web that employs language as its backbone and its ‘nucleus’ (to a large extend) is the most expressive and reliable tool as of today.
Adamantios Koumpis
UNI Passau