Saturday 10 September 2016

A tale of two SAP incidents

Summer is a strange time for SAP teams: lots of people go on holiday, projects are left to tick over, burning issues are put on hold. A time for those who remain to take it easy a while, take a look at those really intractable problems or catch up on technical innovations and pet projects with no-one to bother you. This is all fine in theory except that the business doesn't rest and production incidents still occur. This is the story of two such incidents and how SAP support helped us.

The first occurred right at the start of August, when nearly all my development team had already left: the SWN_SELSEN production job had started to abend. I groaned inwardly when I heard the news. Nothing could be more simple on the face of it than SWN_SELSEN: it simply selects workflow notifications and sends them out. There is no variant but the customising is fearsomely complicated and you needed to be a deeply experienced workflow guru (something I am not) to understand the code and the exits. This was a high-profile problem too - though the business impact was small, the program was sending PO approval notifications to all the top guys in the company, so when these stopped, important people noticed.

I had a look at the dump. The problem occurred about 20 levels deep with a TSV_TNEW_PAGE_ALLOC_FAILED. An internal table space error. The source throwing the error was SAP for sure, but there was a little cluster of badi code around 15 deep in the stack, before the code returned to standard SAP. The badi changes were imported the previous week - I had found the smoking gun. Unfortunately, dear reader, as you may be aware, developers lack a certain credibility and though I strongly suspected the custom code was causing the issue, I couldn't be 100% sure. My lack of certainty combined with my colleagues 100% certainty that the changes had been thoroughly tested with no problem in our quality system turned the spotlight back onto standard SAP code. We looked at the customising and tried to understand it. In the meantime, as Max Attention customers we felt emboldened enough to turn to SAP and opened an incident.

We changed the custo and imported it on production - no effect. Then we received an almost cheerful answer from SAP support that our Badi code WAS causing the problem and to check out some code which called a certain SAP module. Damn - this was embarrassing! I should have checked out this code more but it was complex and just looking at it made me want to get a coffee and mindlessly browse facebook.

Somehow I figured out the the SAP module (in case you're interested it is function 'SAP_WAPI_WORKITEMS_TO_OBJECT') was doing lots of processing that we didn't need, and we could replace it by a join on tables swwwihead and sww_wi2obj. It did the trick, the dumps stopped and the top guys started getting their PO approvals again.

The second problem was more of a long-running issue concerning invoice pricing. This was an "intractable" problem for which we already had an open OSS incident. Now was the time, with everyone away, to have a very detailed look at the problem. Each time I debugged, the problem seemed to arise in an SAP standard formula. No custom code was implicated. For sure, we did have a strange situation where we were taking the sales order in the sales unit, doing the picking in the base unit, and then converting back to the sales unit for billing, but that couldn't be causing the problem, could it? Again, dear reader, I took the easy route and informed SAP their pricing algorithm was wrong.

After spending sometimes entire days debugging, I was finally convinced: the problem is in the SAP code, I told the business analyst; it doesn't go near any user-exit code. So the OSS incident was updated and next day the reply came back: Dear customer, please check these OSS notes and tell us why you have modified standard modules. It advising us that we would have problems if we tried to apply notes to these modules, and providing the notes we should apply. The reply finished with a rather pointed comment, asking us to kindly answer if there was some Z code at pricing. Well, this was really laying down the gauntlet, and my professional pride was piqued.

I was surprised at the modification in the standard include and had no idea why it was there. We set up an IDES system at (nearly) the same EhP level, and compared the code. It turned out that we'd been careless at the last upgrade during the SPAU phase, and simply accepted the modifications for an old note instead of going back to the standard. I backed out the mods. We applied the 2 notes (and 96 pre-requisites) and retested - no joy.

Much highly concentrated debugging followed during which I had to anatomise function RV_INVOICE_CREATE and all its main forms and exits (maybe I'll share this on a future blog if there is sufficient demand. What I found was simple: it was the base to sale units conversion, but this was being done at the wrong time and totally screwing up (to use the technical term) pricing.

There are a few simple lessons to draw from these incidents:

1. If you have a problem and have modified standard code or have user-exits in the problem area, it's highly probable that your changes are causing the problem

2. OSS works very well, and particularly for Max Attention customers, as an 'expert of last resort' that can really help catalyse problems with your SAP system.

3. Mistakes during SPAU can have long-lasting impacts

4. IDES systems are very useful

5. Best to go on holiday when everyone else does.

No comments:

Post a Comment