Project

General

Profile

Bug #8978

Gratia Backlog Information Extremly behind

Added by Nicholas Peregonow almost 4 years ago. Updated almost 4 years ago.

Status:
Accepted
Priority:
Normal
Start date:
07/20/2015
Due date:
08/01/2015
% Done:

30%

Estimated time:
Spent time:
Duration: 13

Description

When looking at https://gratia-osg-prod.opensciencegrid.org/gratia-administration/backlog.html, under local collector table, the collector "collector:http://gratia-osg-prod.opensciencegrid.org:80" is displaying "losing ground, by 12,738,392,739.1days". What exactly is this referring to, and is there way to go about fixing this or having it display correctly?

The new collector gratia-main-osg is also displaying some large amount of days behind as well. It is currently listing itself behind "992,987,205.1"

Filed - RITM0215971


Related issues

Follows Gratia - Task #8977: Enable data forwarding to new collectorsClosed2015-05-292015-06-03

Precedes (1 day) Gratia - Task #8982: Move collector IPs to DNS aliases that point to the new gratia collectorsAssigned2015-08-032015-08-05

Follows (10 days) Gratia - Milestone #9351: Resolve Database issuesNew2015-07-022015-07-09

History

#1 Updated by Nicholas Peregonow almost 4 years ago

  • Tracker changed from Bug to Task

#2 Updated by Nicholas Peregonow almost 4 years ago

  • Follows Task #8977: Enable data forwarding to new collectors added

#3 Updated by Gerard Bernabeu Altayo almost 4 years ago

There is some problem with the DB and/or hibernate (part of Gratia SW):

From: Tanya Levshina <>
To: Nicholas Peregonow <>, Kevin Retzke <>,
Svetlana G Lebedeva <>, Olga Vlasova <>
CC: Joseph B Boyd <>, Mitchell Renfer <>, Gerard
Bernabeu Altayo <>
Subject: bulk deletion
Thread-Topic: bulk deletion
Thread-Index: AQHQm/P4n39U/ugQRUetbHghNEjS4g==
Date: Sun, 31 May 2015 22:48:52 +0000
Message-ID: <>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-Exchange-Inbox-Rules-Loop:
X-MS-TNEF-Correlator:
x-originating-ip: [50.158.252.240]
Resent-From: <>
Content-Type: multipart/alternative;
boundary="_000_85b3432dcb63482c84a5d230acc39bc3MAIL04VCAS03fnalgov_"
MIME-Version: 1.0
Return-Path: prvs=95933DD729=

--_000_85b3432dcb63482c84a5d230acc39bc3MAIL04VCAS03fnalgov_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi,

I am trying to understand what is happening with gratia databases instances=
installed with mysql 5.6. Keep in mind that we didn't experience this prob=
lem before and gratia is running pretty much unchanged for the last 7 years=
.

So, it is definitely related to mysql5.6 and hibernate 4.

mysql> show open tables where In_Use > 0 ;
---------------------------+--------+-------------+ | Database | Table | In_use | Name_locked |
---------------------------+--------+-------------+ | gratia | ComputeElement | 1 | 0 | | gratia | TableStatistics | 1 | 0 | | gratia_itb | TableStatistics | 1 | 0 | | gratia_itb | ComputeElement | 1 | 0 |
---------------------------+--------+-------------+

2015-05-31 16:13:47,340 org.hibernate.engine.jdbc.spi.SqlExceptionHelper(T=
hread-148) [WARN]: SQL Error: 1205, SQLState: 41000
2015-05-31 16:13:47,341 org.hibernate.engine.jdbc.spi.SqlExceptionHelper(Th=
read-148) [ERROR]: Lock wait timeout exceeded; try restarting transaction
and show processlist
29616870 gratia gratia-main-osg.fnal.gov:57486 gratia =
Query 37784 Sending data =
/* bulk delete */ delete from ComputeElement w=
here (dbid) IN (select dbid from HT_ComputeElement)
29648548 gratia gratia-itb-osg.fnal.gov:56603 gratia_itb =
Query 6309 Sending data =
/* bulk delete */ delete from ComputeElement w=
here (dbid) IN (select dbid from HT_ComputeElement)

------- TRX HAS BEEN WAITING 4 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1099 page no 3 n bits 96 index `index1` of table `gra=
tia_itb`.`TableStatistics` trx id 770537878 lock mode S waiting
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info b=
its 0
0: len 7; hex 63757272656e74; asc current;;
1: len 14; hex 436f6d70757465456c656d656e74; asc ComputeElement;;
2: len 0; hex ; asc ;;
3: len 6; hex 00002d69e2f0; asc -i ;;
4: len 7; hex 3a000195830765; asc : e;;
5: len 8; hex 8000000000829e2e; asc .;;

/* bulk delete */ delete from ComputeElement where (dbid) IN (select dbid f=
rom HT_ComputeElement) |

is generated by hibernate not by gratia code; " HT_ComputeElement" is a tem=
porary table created by hibernate while doing bulk deletion.

I will need more time to debug this; unfortunately my next week looks prett=
y bad. Let's see if Kevin can find sometime.

If you have any idea your input is welcome.

Thanks,
Tanya

--_000_85b3432dcb63482c84a5d230acc39bc3MAIL04VCAS03fnalgov_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html dir=3D"ltr">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<style id=3D"owaParaStyle" type=3D"text/css">=0A=
<!--=0A=
p=0A= {margin-top:0;=0A=
margin-bottom:0}=0A=
-->=0A=
P {margin-top:0;margin-bottom:0;}</style>
</head>
<body ocsi=3D"0" fpstyle=3D"1">
<div style=3D"direction: ltr;font-family: Tahoma;color: #000000;font-size: =
10pt;"><br>
<div style=3D"font-family: Times New Roman; color: #000000; font-size: 16px=
">
<div></div>
Hi,<br>
<div>
<div style=3D"direction:ltr; font-family:Tahoma; color:#000000; font-size:1=
0pt"><br>
I am trying to understand what is happening with gratia databases instances=
installed with mysql 5.6. Keep in mind that we didn't experience this prob=
lem before and gratia is running pretty much unchanged for the last 7 years=
.<br>
<br>
So, it is definitely related to mysql5.6 and hibernate 4.<br>
<br>
mysql> show open tables where In_Use > 0 ;<br>
+------------+-----------------+--------+-------------+=
<br> | Database   | Table       &nb=
sp;   | In_use | Name_locked |<br>
+------------+-----------------+--------+-------------+=
<br> | gratia     | ComputeElement  |   =
   1 |         &nbsp=
; 0 | <br> | gratia     | TableStatistics |   &nbsp=
;  1 |           0 |=
<br> | gratia_itb | TableStatistics |      1 | &nb=
sp;         0 | <br> | gratia_itb | ComputeElement  |      1 |&nbs=
p;          0 | <br>
+------------+-----------------+--------+-------------+=
<br>
<br>
<br>
 2015-05-31 16:13:47,340 org.hibernate.engine.jdbc.spi.SqlExceptionHel=
per(Thread-148) [WARN]: SQL Error: 1205, SQLState: 41000<br>
2015-05-31 16:13:47,341 org.hibernate.engine.jdbc.spi.SqlExceptionHelper(Th=
read-148) [ERROR]: Lock wait timeout exceeded; try restarting transaction<b=
r>
<br>
<br>
and show processlist <br> | 29616870 | gratia  | gratia-main-osg.fnal.gov:57486  &nbsp=
;    | gratia       &nbsp=
;      | Query      = |  37784 | Sending data       &nbs=
p;            &=
nbsp;           &nbs=
p;            &=
nbsp;            | /=
  • bulk delete */ delete from ComputeElement where (dbid) IN (select dbid fr=
    om
    HT_ComputeElement) | <br> | 29648548 | gratia  | gratia-itb-osg.fnal.gov:56603   =
         | gratia_itb      &n=
    bsp;   | Query       |   =
    6309 | Sending data         &n=
    bsp;           &nbsp=
    ;            &n=
    bsp;           &nbsp=
    ;           | /* bulk del=
    ete / delete from ComputeElement where (dbid) IN (select dbid from
    HT_ComputeElement) | <br>
    <br>
    <br>
    <br>
    ------- TRX HAS BEEN WAITING 4 SEC FOR THIS LOCK TO BE GRANTED:<br>
    RECORD LOCKS space id 1099 page no 3 n bits 96 index `index1` of table `gra=
    tia_itb`.`TableStatistics` trx id 770537878 lock mode S waiting<br>
    Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info b=
    its 0<br>
     0: len 7; hex 63757272656e74; asc current;;<br>
     1: len 14; hex 436f6d70757465456c656d656e74; asc ComputeElement;;<br>
     2: len 0; hex ; asc ;;<br>
     3: len 6; hex 00002d69e2f0; asc   -i  ;;<br>
     4: len 7; hex 3a000195830765; asc :     e;;<br>
     5: len 8; hex 8000000000829e2e; asc     &nbs=
    p;  .;;<br>
    <br>
    <br>
    <br>
    <br>
    <br>
    /
    bulk delete */ delete from ComputeElement where (dbid) IN (select dbid f=
    rom HT_ComputeElement) |<br>
    <br>
    is generated by hibernate not by gratia code; " HT_ComputeElement&quot=
    ; is a temporary table created by hibernate while doing bulk deletion.<br>
    <br>
    I will need more time to debug this; unfortunately my next week looks prett=
    y bad. Let's see if Kevin can find sometime.<br>
    <br>
    If you have any idea your input is welcome.<br>
    <br>
    Thanks,<br>
    Tanya<br>
     <br>
    </div>
    </div>
    </div>
    </div>
    </body>
    </html>

--_000_85b3432dcb63482c84a5d230acc39bc3MAIL04VCAS03fnalgov_--

#4 Updated by Nicholas Peregonow almost 4 years ago

Turned off the gratia-itb-osg collector and stopped forwarding.

Kevin then ran this query on the gratia-itb database on gratiadb02

delete ce, cem from ComputeElement ce join ComputeElement_Meta cem on ce.dbid=cem.dbid
where Timestamp < (UTC_TIMESTAMP() - INTERVAL 6 MONTH) and ServerDate < (UTC_TIMESTAMP() - INTERVAL 6 MONTH);

Undergoing testing to see if this fixes the issue with long delete queries from hibernate

#5 Updated by Nicholas Peregonow almost 4 years ago

Delete query ran, but hibernate is still running log queries in an attempt to delete data.

#6 Updated by Nicholas Peregonow almost 4 years ago

  • Tracker changed from Task to Bug

#7 Updated by Nicholas Peregonow almost 4 years ago

Spoke with Kevin, and it appears to be a problem with a patch missing from Hibernate. He will review patch to see if this will work with new version of hibernate.

#8 Updated by Gerard Bernabeu Altayo almost 4 years ago

  • Precedes Task #8982: Move collector IPs to DNS aliases that point to the new gratia collectors added

#9 Updated by Gerard Bernabeu Altayo almost 4 years ago

  • Due date changed from 06/04/2015 to 06/30/2015

#10 Updated by Nicholas Peregonow almost 4 years ago

  • Status changed from Assigned to Accepted
  • % Done changed from 0 to 30

Kevin is still in the process of debugging hibernate. He has informed me that at one point there was patch for this, but it was rejected by upstream. He will be checking to see if the Gratia code can be made to work with hibernate, and therefore not rely on a patch that exist outside upstream. If this can not be done, he will look into patching hibernate to get the correct behavior

#11 Updated by Nicholas Peregonow almost 4 years ago

Gratiadb02 and gratiadb03 replication fell out of sync. This has required that svetlana do a manual sync of the databases. During this time, forwarding from the old collectors to the new collectors has been suspended. Because of this, I have not switched the password of the reader account.

#12 Updated by Nicholas Peregonow almost 4 years ago

From Svetlana

David still working on recovering the slave, the issue - we restored fresh backup from the master on gratiadb03 but slave does not see some tables although the physical files are on the disk, and command mysql> desc tablename
does show the table description, but when you try to select from table it says "table ... does not exist"

David is looking right now the explanation and solution for this situation, we definitely have to understand this behavior in mysql v5.6.

#13 Updated by Nicholas Peregonow almost 4 years ago

  • Due date changed from 06/30/2015 to 07/10/2015

#14 Updated by Nicholas Peregonow almost 4 years ago

#15 Updated by Nicholas Peregonow almost 4 years ago

#16 Updated by Nicholas Peregonow almost 4 years ago

#17 Updated by Gerard Bernabeu Altayo almost 4 years ago

  • Due date changed from 08/25/2015 to 08/01/2015

#18 Updated by Nicholas Peregonow almost 4 years ago

Fowarding data was enabled again, but after 2 days of this, it appears db's are out of sync again. Filed a ticket with DBA's to verify if this is the case.



Also available in: Atom PDF