Resolving enq: TX - row lock contention in Oracle Database

"enq: TX - row lock contention" is often assumed to be an application issue, but that’s not always the case.

In this post, I highlight how this wait event can also originate from the database layer due to:

- Misconfiguration.

- Data inconsistencies.

📌Problem Statement:

One of the critical application jobs has experienced significant latency, remaining in a suspended state for the past two days. The session was stuck on the wait event "enq: TX - row lock contention".

Historically, this job completes within 5 minutes; however, over the last 48 hours, the execution time has increased to approximately 28 minutes, requiring manual termination by the application team. This delay has directly impacted the daily business operations. The application team has confirmed that no changes have been implemented on the application side.

As a temporary workaround, the application team manually executed the job in an alternative environment to remediate the data.

From a DBA perspective, the wait event "enq: TX - row lock contention" is typically considered an application-driven issue. It usually arises due to transaction handling and data access patterns defined within the application logic. Therefore, the application team should review the code and transaction design associated with the affected job.

🔍 What Do You Think? What Could Be the Issue?

🛠️ Let’s Troubleshoot the Issue...

Analysis of the v$session log reveals that while the job was launched at 04:41:02 AM, the specific query (SQL_ID: ftxu44fqbahm5) entered the "enq: TX - row lock contention" wait state exactly three minutes later at 04:44:08 AM.

This establishes a clear timeline of the bottleneck, which can now be cross-referenced with historical session patterns to pinpoint deviations from normal operations.

The session log for April 23rd reveals a recurring pattern where the job became stuck on the "enq: TX - row lock contention" wait event, forcing the application team to abort the process. Further investigation is underway on the records for April 22nd to determine whether the same bottleneck was present.

Data from April 22nd shows the job completed successfully without encountering the "enq: TX - row lock contention" wait event, proving that the bottleneck only emerged after this date and narrowing our investigation to environmental or transactional changes that occurred immediately following that window.

While "enq: TX - row lock contention" is traditionally viewed as an application-level logic issue requiring a code review, the current situation deviates from that norm because the application team has formally confirmed that no changes or deployments were made to the system during this period.

🤝 Let’s have a discussion with the application team.

The application team has noted a significant change in behavior where the problematic query is returning incorrectly ordered data, a deviation from its previous performance that suggests a potential shift in how the database is fetching or sorting records despite no direct code modifications. Previously, the data was returned in the correct order, based on which the application assigned sequence numbers to the output data and performed further processing to complete the job.

Let's look at the query and the query execution plan.

SQL_ID: ftxu44fqbahm5

Query Text:

select rnum,key from (select /*+ parallel(APTB,4) */ inst_no||acc_no key,row_number() over(order by inst_no, acc_no, ROWID) rnum from APTB ) where rnum in(

262694,

525388,

788082,

1050776,

1313470,

1576164,

1838858,

2101552,

2364246,

2626940,

2889634,

3152328,

3415022,

3677716,

3940410,

4203104,

4465798,

4728492,

4991186);

As you can see from the query result set, the rnum values appear unordered, which is incorrect from the application’s perspective and leads to data mismatches.

Note:

I have used asterisk marks (*) to mask the data, as it contains sensitive information and cannot be fully shared.

Query Output:

RNUM KEY

-------- -------------------

3677716 0022000004********1

3415022 0022000004********2

525388 0011000001********3

788082 0011000002********4

3152328 0022000004********5

1050776 0011000002********9

1576164 0011000002********4

2889634 0022000003********5

3940410 0022000004********8

1313470 0011000002********7

262694 0011000001********5

4728492 0022000005********5

4991186 0022000005********9

2364246 0011000003********6

4465798 0022000004********2

1838858 0011000003********3

2101552 0011000003********2

2626940 0022000003********9

4203104 0022000004********8

19 rows selected.

From the above execution plan, it has been observed that the optimizer is choosing a parallel execution plan. Let's remove the parallel hint from the query and re-evaluate the execution plan.

SQL> select rnum,key from (select inst_no||acc_no key,row_number() over(order by inst_no, acc_no, ROWID) rnum from APTB ) where rnum in(

262694,

525388,

788082,

1050776,

1313470,

1576164,

1838858,

2101552,

2364246,

2626940,

2889634,

3152328,

3415022,

3677716,

3940410,

4203104,

4465798,

4728492,

4991186);

From the above observations, it is evident that even after removing the parallel hint from the query, the optimizer continues to choose a parallel execution plan.

➡️ What is next then?

As a next step, the same query was tested in an alternate environment containing data as of 22nd April. In that environment, the query executed as expected without any issues. The optimizer selected an appropriate execution plan, and the application team confirmed that the output was returned in the correct order.

This comparison indicates that the issue is likely environment-specific and may be related to differences such as data volume, statistics, indexing, or database configuration between the two environments.

SQL> explain plan for

select rnum,key from (select inst_no||acc_no key,row_number() over(order by inst_no, acc_no, ROWID) rnum from APTB ) where rnum in(

262694,

525388,

788082,

1050776,

1313470,

1576164,

1838858,

2101552,

2364246,

2626940,

2889634,

3152328,

3415022,

3677716,

3940410,

4203104,

4465798,

4728492,

4991186);

Explained.

Note:
I have used asterisk marks (*) to mask the data, as it contains sensitive information and cannot be fully shared.

Query Output:

    RNUM KEY

-------- -------------------

  262694 0022000002********

  525388 0011000001********

  788082 0011000001********

 1050776 0011000001********

 1313470 0011000002********

 1576164 0011000002********

 1838858 0022000004********

 2101552 0022000004********

 2364246 0022000003********

 2626940 0022000004********

 2889634 0011000002********

 3152328 0011000001********

 3415022 0011000001********

 3677716 0011000001********

 3940410 0011000002********

 4203104 0022000004********

 4465798 0022000003********

 4728492 0022000002********

 4991186 0022000003********

19 rows selected.

From the above query result set, You can see that the query result is in the correct order.

➡️ What is next ? What can be the issue ?

Check the degree (parallelelism) assigned to both problematic table and indexes in good and bad environment servers.

Good Environment:

SQL> select index_name,degree from dba_indexes where index_name='APTBPK';

INDEX_NAME DEGREE

--------------- --------

APTBPK 1

Bad Environment:

SQL> select index_name,degree from dba_indexes where index_name='APTBPK';

INDEX_NAME DEGREE

--------------- --------

APTBPK 64

📊 Observation:

From the above output, it is evident that the degree of parallelism (DOP) for the problematic index is set to 64 in the affected (bad) environment, whereas in the healthy (good) environment it is set to 1.

❓ Key Question:

- How did this change occur in the production database?

- Who modified the parallel degree setting?

🤝 Next Step:

Let us engage with the customer to check if any activities or changes were carried out around 22nd April that could have led to this behavior.

After discussion with the customer, it was confirmed that a planned reorganization (re-org) activity was performed on table APTB.

🛠️ Solution:

1) Set the index back to serial execution by removing parallelism:

SQL> alter index APPSUSER.APTBPK noparallel;

SQL> select index_name,degree from dba_indexes where index_name='APTBPK';

2) Monitor the next run of the job.

✅ Final Conclusion:

The application job completed successfully after correcting the index degree (DOP) settings. The optimizer selected the appropriate execution plan, and the query returned data in the correct order.

🚫 No further occurrences of "enq: TX - row lock contention" were observed.

✔️ The same has been validated and confirmed by the application team.

🔍 Root Cause:

During the table and index reorganization, the degree of parallelism (DOP) for the affected index was set to a higher value (64) and was not reset back to the original value.

⚠️ This led to:

The optimizer consistently choosing a parallel execution plan.
Change in data retrieval order
Resulting in incorrect ordering of output data.
Ultimately impacting the application job processing.

Since the application job was assigning sequence numbers to the result set and generating a new sequence, the unordered results led to data mismatches. This, in turn, caused transaction inconsistencies in the database, resulting in "enq: row lock contention" issues at the database level.

🎉 Enjoy the troubleshooting journey!!!

📝 Stay tuned for a detailed blog post on this case !!!

Thanks for reading this post ! Please comment if you like this post ! Click on FOLLOW to get next blog updates !

Resolving enq: TX - row lock contention in Oracle Database

Post a Comment

Oracle Enterprise Manager 24ai Installation on Linux 8

Oracle Enterprise Manager 24ai Installation on Linux 8

Oracle DBA ++

Followers

Search This Blog

About Me

Contact form

Resolving enq: TX - row lock contention in Oracle Database

You may like these posts

Post a Comment

Contact form