Other Brands

Media Hub

Media Hub

CEO Forums

Podcasts

Power Panels

Tech Talks

Technology Showcases

Webinars

Event Interviews

Other Brands

Logo

Trustworthy coverage of the transformer and transformer-related industries.

Visit Website
Logo

Trustworthy coverage of the transformer and transformer-related industries.

Visit Website

Monitoring and Diagnostics: When Equipment Speaks Part 1 – Power Panel Discussion

Nathan Jacob

Alan Ross: My two guests for our Power Panel today are Tony McGrail, Senior Solutions Director of Asset Management and Monitoring at Doble, and Nathan Jacob, Senior Asset Specialist for Camlin Energy. Gentlemen, thank you for joining us.

Gentlemen, we will present the first part of our Power Panel discussion in this Issue and Part Two in our April/May Issue. We will look at where the industry is going in the use of advanced testing, AI and ML. For now, please share your experience and your perspective on where we have been and where we are now, as an industry, when we consider asset monitoring.

Let me start with you, Nathan.

Nathan Jacob: In the early stages of monitoring, I think a lot of people conceived these as devices to alert and alarm primarily. They were like protection systems, where it was just binary states. Is the alarm alarming or not?

Now there’s a shifting perspective where people are looking to gain more value from the data they’re collecting, and they’re wanting to use this data to make maintenance decisions. Condition based maintenance means not just having an early warning of failure, but to really optimize the asset. If there is a way to be predictive in their maintenance, they’re really focusing on that as well.

Another key distinction today is greater integration, moving the data to the cloud and having better visualization of the data and better interpretability of the data. With the data being more accessible, and easy to visualize in a single interface, they can have their SMEs review that data more effectively and make better decisions on that basis.

I think the last point I would make is that for single assets, like transformer DGA monitoring, there’s also a lot of motivation to do more holistic monitoring, combining sensors for different components on a single asset. For power transformers specifically, it’s the combination of DGA monitoring with partial discharge monitoring, with bushing monitoring, with monitoring of arcing and other system events, and really having all that data combined in their analytics to make more effective decisions from that data.

AR: Thank you. Tony, your perspective?

Tony McGrail: I think it’s a journey of what you might call a triumph of hope over expectation. Many years ago, when you got into condition monitoring, it was a box with lights on. It was something that they did over there. Over time people have realized that monitoring is a useful thing rather than an engineering toy. As the photo above indicates, this goes back with development at National Grid UK of a multi-gas DGA system, 25 plus years ago. It was in the real world. It worked. It did what it had to do.

The main concern I had was that the transformer that we were examining overheated substantially. My main concern was that to analyze the oil I had to take the oil out of the main tank, take it into my little examination cabin, which I would call a relocatable DGA device, not necessarily portable, but relocatable, where we did lots and lots of analysis and experiments, with all the available devices that were on the market including our own devices.

My main concern was that the oil would leak out from that and get into the local reservoir, which would be feeding northeast London, and I’d be on the 06:00 news. So, my own monitoring was to get up on the first night I stepped in the cabin and walked around the whole thing every 30 minutes. And what this does is it reminds me of not just the fact that we’re doing something beautifully technical, but there are consequences if this thing goes wrong, and there are consequences not just if the data is not valid or correct or precise or relevant, but if we analyze it without reference to the context, and if we analyze it just in independence.

What I’ve seen over the years is the move away from condition monitoring being something that they do to being something that the organization does, for organizations which are aware of, for example, ISO 55000 asset management, which has as a requirement that you monitor the performance of your assets. It doesn’t prescribe how to do it, but it says, as you have these assets, you need to know which ones are in good condition, and you need to know what the plans are for each asset.

That has changed things where the context has changed. Condition monitoring becomes a lot more vital for the whole organization, and it needs to be embedded in the organization. People have to understand what value it can bring.

When we talk to, for example, insurance companies, they will tell us that the big loss payouts, for example transformer failures or substation asset failures, it’s not to do with the replacement cost of the asset, but it’s to do with the business interruption. That identification, not just of the technical need, but the business need and the asset management context, is hugely important in deciding which box, with which lights, and then how we use it; what our expectations are, how to make sure that we get value from it so we don’t go “from ignorance to negligence”. This is a quote from Tommy Salmon, formerly of Dominion Energy, now at GE, because they did have cases where the data implied one thing, and they did something else. People who are making the decisions need to have something which is useful, contextual, and supports decision making.

Tony McGrail

DATA-DRIVEN DECISION MAKING

AR: Let’s go from the idea of machines collecting data to data-driven decisions with subject matter experts. Tony, I’m going to start with you on that one. Where are we right now with that?

TM: I think it depends on where you look, since the whole role of condition monitoring is to provide data for supporting decisions. These decisions previously were often made by default. If no one’s complained that it’s gone dark in their town, then our transformer in our substation is likely to be okay, so let’s just leave it alone. Now that we have data, it behooves us to make sure that we do something useful with it. And it’s also a case of there’s no point in putting a condition monitoring device on anything, unless we have some expectation as to what the numbers should be.

As slide on the left shows, this is very simple condition monitoring from something which is less abstract than a power transformer. This is from just an automobile, a car, and we’re looking at the tire pressures. I’ve got four tires; I’ve got three at 37 and one at 39. Did one of them rise or did three of them fall? Condition monitoring on the tires is based on, in this case, what I expect it to be.

In the right case, I’ve got four tires, two at 38, one at 29, and one at 51. This is condition monitoring data. I have got it. It’s valid. What on earth do I do with it? So not only do you have to have expectations, you also have to have a plan to respond. And to have a plan to respond, you need to know what levels or trends or diagnostics are included in a way that you can say, I need to do something now.

That plan has to be in place from when you install the monitor, because if it isn’t, what will happen is you’ll get an alert, come in, and then you’ll call a council of the wise and come together and we’ll start to discuss data and timescales and rates of change and consequence, and the thing will fail in the meantime. You’ve got to be very, very careful that condition monitoring is seen as something which is managed ahead of time.

Nathan Jacob

We used to say that there were two main reasons to do condition monitoring. One was to learn about the monitor and one was to learn about the asset. These days, there’s a third reason, and that’s to learn whether the organization is capable of managing the whole of the condition monitoring process, from specifying a monitor through to gathering the information and making decisions with it. That’s a big if, because a lot of people are not prepared for this. They install the monitor and it really is just a box with lights.

I would say that people who are good at condition monitoring tend to be good asset managers as well. There is a corollary to that from Guy Don Schubert of Marsh Insurance who said: “People who are good at asset management are also good at condition monitoring. But the reverse, just applying condition monitoring does not make you a good asset manager.” It just means you’re good at spending money. You must be very careful spending money.

Tony McGrail

AR: Going back to the tire pressure illustration, were the 51 and 29 bad and the 38 was normal? Tony, you can’t just leave us hanging.

TM: For most people, two of them were okay, slightly down. This was an actual rental I had, where one tire was blown up at the garage, the 51, so it was way too high. When I called the rental agency, they said I’d be fine, just leave it for a week since you’ve got the car for a whole week. Just leave it, you’d be fine. The theory is they know what they’re doing, and since my knowledge on these things is relatively limited, I had to trust the SMEs.

AR: That’s an interesting point. The asset owner didn’t really have a plan. If that 51-pound tire had blown up because you hit a bump or the other one at 29 pounds had gone flat because it was already going flat, you’re the guy stuck out there wondering, what am I going to do now?

Nathan, same question for you, as we look at this idea of data, and data-driven decision making.

NJ: I like the sentiment in Tony’s comments, it makes a lot of sense in terms of a monitoring strategy. A lot of utilities will approach the problem knowing that they just want better reliability out of their asset and then they’ll deploy monitoring, but without understanding the impact of that on their organization, like where the data is going, who will respond to alarms, how you respond to alarms as part of a larger strategy.

It is an important consideration to really think about what your overall strategy for monitoring is, but then also for the asset manager, and the use of data, and the analysis of the large volumes of data.

In the slide on the next page we see an example of a kind of an intelligent algorithm that utilizes data from multiple sources. This is a real example of our algorithm that’s processing data from the original online monitoring information. In the upper left corner, we see online trends of DGA information. We can see clearly later in the trend that there’s a sudden rise and some step increases in gassing. Right now, of course, the monitor alerted the users to that condition, but just that bit of information alone.

Knowing that there was an alarm doesn’t determine the actions we must take immediately. We have a few analytical tools kind of displayed here, but in the bottom left, you actually see what’s called the DGA matrix. That is an evaluation of the different concentrations of gas by different industry standard diagnostic methods. You can see most of them indicate a thermal fault. There’s some variability in the diagnostics. Different methods will determine different classes or outcomes for the type of diagnosis or the severity of the fault, but it is at least a clue.

When you look at this, what does a thermal fault there mean actually? The fact that the monitor detected something, it did its job, you’re still not really any further ahead in terms of diagnosing the actual problem. In terms of integrating a kind of holistic solution and having more information working together in concert with each other, to give a clearer picture of what the issue is inside of an asset, that’s really what we’re going for here.

And when that data, the DGA information, is processed, it immediately throws a couple of suspected potential fault conditions. That is very important because it at least equips the asset manager with the information to know these are potential areas they could look at next. There are a few suggestions for potential faults indicated in yellow, and then to a lesser degree, or lesser confidence of identifying the other potential faults indicated by green. In the green highlighting in the wheel you can see some of the suggested potential faults: overheated cellulose, overheated bare conductors, overheating structures, and then core insulation failure. The identification of potential faults indicates to the asset manager, I should attempt one of the recommended actions here, the core insulation test. We can see above in the table where the arrow is pointed to the yellow arrow, that two of the tests failed, indicating a short between the frame and the core.

Clearly there’s a core insulation failure with a higher degree of confidence that this is the identified failure mechanism. This is the type of visualization we have, the type of concept we have, indicating that you should really integrate all the tests, all the monitoring data, all the maintenance that’s applied on the transformer or any asset, and attempt to have this kind of decision support tool that an asset manager can use to help them determine the next step. But then, when they have a larger body of data to do analytics on, it will give a clearer picture of what the potential fault conditions are. A lot of it is like a logical model, applying logic and applying correlations that a human expert would apply. This is the type of tool that could be used for that type of purpose to integrate all the data that’s being collected.

AR: Tony, do you have a comment?

TM: Nathan, that is a good example. I had a question about the step in the data and how long it was between the step appearing and the transformer being taken out of service, because it looked as if it was about ten days from the axis.

NJ: Yes, that is correct.

TM: When you have the data suddenly rise, to wait to get more data is a decision that has to be made or not made, since it is in a default situation, which is keep it. That step by itself should be an indication that there needs to be a plan in place to respond to, because you’ve had a change in the DGA levels. How long was it left in service and energized before it was removed?

NJ: Utilities aren’t always so responsive to take an outage, so it is a kind of assumed risk which they may feel is necessary to take in certain moments. We talked about the strategy and responding to alarms. That is certainly something that could be defined in the strategy. What is the change in gassing on a monitoring device that has us respond and interrupt the unit immediately? Some of that should be managed through alarm management, right? You should not just wait on feedback from the diagnostic tool and the additional data that’s collected over a period of time, but you should have criteria set out as a part of your larger strategy when you take an outage in response to an alarm condition, and what the thresholds are for that.

AR: Which is having a plan, a strategy, right?

TM: I’ve got from a case of a bushing and the owners of, I think, about 70 transmission class transformers. They had a lot of the bushings, which fail quickly. And they had the bushing monitor. When the bushing gave top level alert, operations had to switch a transformer out in two minutes. Now, these are transmission units, it is like trying to stop a battleship. It’s a lot of work to do. They don’t stop very easily. And so that was their written and agreed policy when they install a monitor. That, to me, makes sense, because they knew they were going to fail. They knew that they had this failure mode. And that’s the trench bushing problem. The plan was put in place when the monitor was installed, and it was two minutes. Admittedly, they did have some false positives, but they worked out why. Then the false negatives were avoided, and the false positives are the price you pay to avoid false negatives. The result was they saved transformers, and they saved multi-multi-millions of dollars as a result. They’re still doing so now. But as soon as I saw that step, I’m like, why aren’t we doing something right there?

What was done then? Did they think, okay, we’ll just leave it and see what happens? Because we’ve had that as well. And then it fails, and then people say, your monitor didn’t do anything. But yes, it did. Look, it told you, and you hit the button saying, do something, acknowledge, acknowledge. The plan must be agreed and in place by all parties as early as possible.

AR: Yeah, you’re hitting the point that when the equipment speaks, we are supposed to listen. Sometimes it speaks very loudly, and sometimes it speaks a little less loudly, but trending. I think, Tony, that’s kind of your point. When in the trend, do you suddenly take a piece of equipment out of service? Because if you begin to see it, that step function, as you said, was big. In Nathan’s slide, that first step function was pretty mean if you looked at, and I can’t read exactly what gas was. That first step function, you’re pointing to the top left, is that correct, Tony?

TM: Yes.

NJ: So, in the initial step, both ethylene and methane, are quite dramatic. The ethylene increase may be a clue already in the DGA that maybe this is a metal-to metal contact that’s causing the heating, but it still requires the additional evaluation, the additional testing to be confident to the failure mode. I know we’re going to go into machine learning and AI as part of the discussion as we proceed here, but I think there are some thoughts that maybe you could have an AI applied just to trends like this and identify faults. I tend to think that’s unlikely, so the kind of guidance to do, first the warning from the monitoring system, then the kind of initial diagnostic and the decision support, what should we do next? I think those are all key elements, and I think you do need SMEs working with tools like this to make those sorts of decisions.

TM: I think Nathan’s got a very good case there. It’s an interesting example of the diagnostics coming into support and getting contextual data. The problem on the asset management side is if you take a transformer out because you found condition monitoring data that indicates a possible problem, and you don’t find a problem, when you go and dive inside and you do all the testing that can happen, you get yelled at for the fact that you’re not using the transformer to generate money somewhere. If you don’t take it out and it fails, you also get yelled at. And yes, I’ve been yelled at. So, there are two places that you really don’t want to be, and you’re stuck between them, the rock and the hard place.

AR: Well, in your example, you get yelled at whichever you do, so it’s a Catch 22, right?

TM: Yes. The decision maker isn’t necessarily going to be an expert in the condition monitoring, or the technical application, or the individual measurements and the failure modes analyzed, and what might be wrong. It does get to be a difficult decision.

AR: Gentlemen, this has been a fascinating discussion about where we are now when the equipment speaks, but we must end Part One here. In our April/May issue we will address the rest of our conversation, on AI, ML, advanced monitoring and how things are changing, discussing “Where We Are Going: When the Equipment Speaks.” Thank you very much, Nathan and Tony.

A magazine cover featuring three electrical transformers against a dark background with blue light streaks.

This article was originally published in the March 2024 issue of the Monitoring & Diagnostics: Technology, Data Science & AI magazine.

View Magazine

Like this article? Share on

Subscribe image

Subscribe to our Newsletter

Subscribe to our newsletter and stay ahead with the latest innovations, industry trends, and expert insights in power systems technology. Get updates on cutting-edge solutions, renewable energy advancements, and essential best practices delivered straight to your inbox.