“Evergreen content” = (File + AI) x SEO. A simple but effective algorithm
Artificial Intelligence opens an opportunity for collaboration between technology and media companies to bring their content to life.
If there is one issue that defines and gives value to a medium, it is its contents. It is the supply, quantity and quality of the information o entertainment offered what makes the difference, positive or negative, compared to other alternatives. All these contents are a very valuable heritage that is stored in the files, departments that, although discreet compared to others in the world of media, are the memory that allows us to know who we are, where we come from or the origin of many stories that are current news today.
In them we find “archive” stories that enrich and complement news or entertainment. But there are many opportunities that they can offer beyond being a material for regular use as support for current content. Tools for Artificial intelligence (IA) in documentary collections, either for automatic metadata, for searches or for managing large amounts of data, among others.
evergreen content
During the pandemic caused by coronavirus The contents called “evergreen“, which in Spanish could be translated as contents “perennials” o “timeless". Its name comes from the evergreen plant, which retains its green leaves throughout the year, and are those that, being neither current nor archival, are never outdated and deal with issues that are relevant to the audiences.
This is information that has been in high demand since the beginning of the COVID-19 pandemic, such as, for example, being in better physical condition, how to telework, how to create more suitable spaces in our homes, how to cook healthier, etc. There are two very relevant issues in them: the content, which is relevant even if the years pass, and the topics, those that always have interest and a significant search volume regardless of the moment in which we find ourselves.
There are many tools IA In the market, there is a lot of demand for content and many of these contents are in our files but there is no specific solution to facilitate this work, or generate recommendations or specific content for specific audiences.
For this, tools could be used that analyze the SEO of our content and the traffic of our media to see what interests the public, systems that analyze what people search for in the web, reading, listening and/or watching (topics/subjects) as well as in social networks. So, wouldn't we have a very powerful tool if AI mixed them to recommend, retrieve or generate specific, new content adapted to demand?
The challenge
Answering questions like this has been the objective of the international team of media professionals in which he has participated RTVE inside the JournalismAI Collab, a project of the Polis Studies Center of the London School of Echonomics that is supported by the Google News Initiative. Collab is an experiment collaboration in which different news organizations from around the world, by type of media, size, audience, etc. have come together to explore innovative solutions to improve journalistic activity using AI.
The objective on which we have worked has been possible use of AI to generate this type of content taking advantage of existing files and to know whether or not there were tools on the market that provided a response by creating useful, well-positioned content, with impact and that responded to the demand - high, as we have seen - that exists among the audience.
The final report suggests that news organizations should work with technology companies to examine these needs so they can help develop new possibilities y tools. We've spoken to some of the top companies in the sector and here are some of their ideas.
The opinion of the specialists
To Richard Benjamins, Chief AI & Data Strategist in Telefónica, Spanish multinational company located among the main telecommunications companies in the world, a solution could be found by following two paths.
The first would be to define what “evergreen” content is (in terms of words, images, video or sound) and, automatically, with machine learning, categorize as such those that are considered that way and, the second, train a algorithm con Deep Learning on a document base that serves as a reference and then pass the complete repository.
Both may be possible. The question would be how good it is, and if this is enough for it to systematically provide value. In the end we talk about the knowledge management of a company, a field in which successes are few, although technically achieving it is possible.
Telefónica, which is dedicated to providing services, has a unit dedicated to Big Data and AI. Currently they do not work with projects linked to “evergreen data”, but in the future they could be interested seeing that it is an attractive field, which could have a welcome and future in the market.
Benjamins considers it important, to define a valid product, to complete tests with users and define its explainability, what its daily use would be like. “The technology is there, it wouldn't be difficult to do.”, he assures.
Narrative, AI company specialized in automatic generation of content, considers that this type of content is not only useful, but is the future, both for companies and for the media.
“The digital transformation that has been experienced in recent years and that has accelerated with the pandemic confirms the absolute prominence of digital media, so mere online presence is no longer enough: it is necessary to be relevant,” he says. David Llorente, CEO and founder of Narrativa.
However, when generating this type of content they find two main difficulties. The first is that many companies currently invest a large amount of time, money and resources into manual content generation. This means developing the texts manually and implies less agility in the process. Secondly, generating content is not enough, it needs to meet a series of requirements according to the needs of the medium/company to be able to appear in search engines.
At Narrativa they are already developing this type of technology, combining specific keywords aimed at better SEO positioning. The labels that they use aimed at very specific searches by users in search engines.
In this way, the results are much closer to what potential clients want to find. Recently, they have generated car descriptions for a client that have managed to place directly within the first 10 results that Google returns.
Therefore Not only would it be feasible, but it would also be profitable for companies, which would save time and costs. The tools provided by artificial intelligence, they say, would allow for “evergreen” content of greater variety and would allow journalists to focus on tasks with greater added value.
The application of artificial intelligence techniques offers undoubted advantages in many areas, such as natural language processing, but the problem of identifying evergreen content is potentially complex and difficult to formulate, considers José Manuel Gómez-Pérez, Director Language Technology Research of Expert.AI.
A priori, we can think that it can be solved by training from scratch a model that, given a document, classifies it as evergreen or not. If we assume that the content itself is sufficient to solve the problem and that data on the impact generated by that content over a significant period of time, for example, would not be necessary, an approach like this seems viable.
However, it faces a variety of straight, such as the generation of a large enough corpus of documents and their corresponding labeling to train the model. It is technically feasible, he believes, but it needs resources to generate that data set and label it, a task that can involve a investment significant depending on the volume that needs to be extracted and recorded.
It seems much more interesting, he says, to apply techniques based on pre-trained models that only need to be adjusted for this specific task or to apply approaches based on rules formulated by a knowledge engineer that reflect their understanding of what evergreen content can be.
At Expert.AI they have faced similar problems in areas such as narrative analysis jihadists or the detection and analysis of disinformation in online media. In their own way, both the narratives and the basic topics on which misinformation focuses are evergreen content intended to capture the attention of its target audience in a timeless manner. The optimal solution is to establish a alliance between artificial intelligence and the users it assists, a partnership that reverts to AI systems that feed on the feedback of users, offering increasingly better predictions.
The Danish technology company Spor.ai advises return the decision-making capacity to the journalist and, after letting AI generates a list of suggestions based on one or several combinations that could be refined by introducing a set of filters.
One possibility could be to display regular dropdowns, although Spor.ai finds it more convenient to display the calculation as a knowledge graph. You could then edit and filter the relationships between the entities that define the result on the graph screen. This would keep the overview of chosen relationships that are harder to see with regular filters.
Group conclusions
Although we did not finish developing an imaginary universal tool, which we call “ArcAI”, we did manage to bring together many experiences y knowledge valuable that demonstrate that it is possible to build solutions to take advantage of files using AI-based tools or solutions and that, even in part, some that already exist could be useful. We also discovered a series of challenges, limitations and some basic questions to answer what we want to achieve.
There is great potential in the archive, but what are the specific needs of each newsroom? There is no reason to develop an advanced research tool if what you need is to introduce a metadata tag for a specific type of content or define simple cyclical notifications. Different newsrooms have different needs, as well as different definitions and objectives of what this type of evergreen content really means for each of them.
Since there are very few tools available it should be decided which solution is needed. The more advanced the technical methods, the more development work it will require.
Using the Natural Language Processing (PNL), he Recognition of Named Entities (NER) and the Automatic Learning / Machine Learning (ML) in combination with manual labeling and/or knowledge graph filters, you can get quite accurate results on your files. But would it be enough to put a search field in the content management system, the CMS? What are the criteria that should qualify a good match? How much filtering work will it put in the hands of the journalist?
When working with the file it is essential to have a good coherence y structure in the database and metadata. The better the structure, the easier it will be to take advantage of the database with the use of Artificial Intelligence tools.
To implement a tool like this, whether based on manual labeling systems, scanning methods or any other technology, you also need to have the support of the organization and its professionals. Developing these tools so that they ultimately become a waste of time, resources and money does not make sense if they clash and are nullified by certain business “cultures” or by the lack of motivation and involvement of their theoretical users.
In the case of non-English speaking media, it is decisive take into account the language If you decide to use some of the technologies on the market, such as Parse.ly o Chartbeat, since their algorithms, in most cases, have been trained in English or Chinese and are considerably better than in other languages. Whether it is your own or another's technology, it is best to train the tool with the contents of your own files to obtain the result that best meets your needs.
Among the opportunities there is the ability to notify journalists when previous content is reappearing in search engines; obtain better SEO positioning, suggest related and relevant stories or reuse elements of previous content to create timelines or other formats, among many others.
Perhaps the main result of our team's work is the ask tech companies to get involved and join forces with the media to develop accessible tools that give life to already published content and help put the enormous potential of archives into journalistic content.
David Corral
RTVE Innovation
Article originally published in the Observatory for the innovation of News in the Digital Society (OI2)
Did you like this article?
Subscribe to our NEWSLETTER and you won't miss anything.

















