Discussion:
(BUG)ShortCircuitLocalReads Failed when enabled replication
(too old to reply)
Ming Yang
2016-08-11 07:46:41 UTC
Permalink
The cluster enabled shortCircuitLocalReads.
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>

When enabled replication,we found a large number of error logs.
1.shortCircuitLocalReads(fail everytime).
2.Try reading via the datanode on targetAddr(success).
How to make shortCircuitLocalReads successfully when enabled replication?

2016-08-03 10:46:21,721 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
log for replication dn7%2C60020%2C1470136216957.1470192327030 at 16999670
2016-08-03 10:46:21,723 WARN org.apache.hadoop.hdfs.DFSClient:
BlockReaderLocal requested with incorrect offset: Offset 0 and length
17073479 don't match block blk_4137524355009640437_53760530 ( blockLen
16999670 )
2016-08-03 10:46:21,723 WARN org.apache.hadoop.hdfs.DFSClient:
BlockReaderLocal: Removing blk_4137524355009640437_53760530 from cache
because local file
/sdd/hdfs/dfs/data/blocksBeingWritten/blk_4137524355009640437 could not be
opened.
2016-08-03 10:46:21,724 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
read block blk_4137524355009640437_53760530 on local
machinejava.io.IOException: Offset 0 and length 17073479 don't match block
blk_4137524355009640437_53760530 ( blockLen 16999670 )
at org.apache.hadoop.hdfs.BlockReaderLocal.<init>(BlockReaderLocal.java:287)
at
org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:171)
at org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:358)
at org.apache.hadoop.hdfs.DFSClient.access$800(DFSClient.java:74)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2073)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2224)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:178)
at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:734)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:574)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:364)
2016-08-03 10:46:21,724 INFO org.apache.hadoop.hdfs.DFSClient: Try reading
via the datanode on /192.168.7.139:50010
Dima Spivak
2016-08-11 14:32:39 UTC
Permalink
Hey Yang,

Looks like HDFS is having trouble with a block. Have you tried running
hadoop fsck?

-Dima
Post by Ming Yang
The cluster enabled shortCircuitLocalReads.
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
When enabled replication,we found a large number of error logs.
1.shortCircuitLocalReads(fail everytime).
2.Try reading via the datanode on targetAddr(success).
How to make shortCircuitLocalReads successfully when enabled replication?
2016-08-03 10:46:21,721 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
log for replication dn7%2C60020%2C1470136216957.1470192327030 at 16999670
BlockReaderLocal requested with incorrect offset: Offset 0 and length
17073479 don't match block blk_4137524355009640437_53760530 ( blockLen
16999670 )
BlockReaderLocal: Removing blk_4137524355009640437_53760530 from cache
because local file
/sdd/hdfs/dfs/data/blocksBeingWritten/blk_4137524355009640437 could not be
opened.
2016-08-03 10:46:21,724 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
read block blk_4137524355009640437_53760530 on local
machinejava.io.IOException: Offset 0 and length 17073479 don't match block
blk_4137524355009640437_53760530 ( blockLen 16999670 )
at org.apache.hadoop.hdfs.BlockReaderLocal.<init>(
BlockReaderLocal.java:287)
at
org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(
BlockReaderLocal.java:171)
at org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(
DFSClient.java:358)
at org.apache.hadoop.hdfs.DFSClient.access$800(DFSClient.java:74)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.
blockSeekTo(DFSClient.java:2073)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(
DFSClient.java:2224)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$
WALReader.<init>(SequenceFileLogReader.java:55)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(
SequenceFileLogReader.java:178)
at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:734)
at
org.apache.hadoop.hbase.replication.regionserver.
ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.
java:69)
at
org.apache.hadoop.hbase.replication.regionserver.
ReplicationSource.openReader(ReplicationSource.java:574)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(
ReplicationSource.java:364)
2016-08-03 10:46:21,724 INFO org.apache.hadoop.hdfs.DFSClient: Try reading
via the datanode on /192.168.7.139:50010
--
-Dima
yangming860101
2016-08-16 04:21:22 UTC
Permalink
/*Are you seeing it for all the log files? The local ReplicationSource will
handle the WALs for replication from this RS. But it may so happen that this
RS went down and so another took charge of doing replication of the WAL
files originated at the current down RS. Those WALs might not be local for
the current replicating RS. So you may see SCR not happening. As added logs
here I just commented.*/

Dima Spivak,no RS went down.All server logs are the same:
HDFS log(no error log):
2016-08-12 14:25:49,902 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.7.139:50010, dest: /192.168.7.139:55856, bytes: 66048, op:
HDFS_READ, cliID: DFSClient_hb_rs_dn7,60020,1470216726863, offset: 0, srvID:
DS-1014379950-192.168.7.139-50010-1416802300444, blockid:
blk_-4614053595081055029_54130616, duration: 252426

Src and dest are the same ip.



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/BUG-ShortCircuitLocalReads-Failed-when-enabled-replication-tp4081733p4081820.html
Sent from the HBase User mailing list archive at Nabble.com.
Ted Yu
2016-08-11 15:40:49 UTC
Permalink
What's the value for dfs.domain.socket.path ?

See explanation in http://hbase.apache.org/book.html for the meaning of
this config.

Cheers
Post by Ming Yang
The cluster enabled shortCircuitLocalReads.
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
When enabled replication,we found a large number of error logs.
1.shortCircuitLocalReads(fail everytime).
2.Try reading via the datanode on targetAddr(success).
How to make shortCircuitLocalReads successfully when enabled replication?
2016-08-03 10:46:21,721 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
log for replication dn7%2C60020%2C1470136216957.1470192327030 at 16999670
BlockReaderLocal requested with incorrect offset: Offset 0 and length
17073479 don't match block blk_4137524355009640437_53760530 ( blockLen
16999670 )
BlockReaderLocal: Removing blk_4137524355009640437_53760530 from cache
because local file
/sdd/hdfs/dfs/data/blocksBeingWritten/blk_4137524355009640437 could not be
opened.
2016-08-03 10:46:21,724 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
read block blk_4137524355009640437_53760530 on local
machinejava.io.IOException: Offset 0 and length 17073479 don't match block
blk_4137524355009640437_53760530 ( blockLen 16999670 )
at org.apache.hadoop.hdfs.BlockReaderLocal.<init>(
BlockReaderLocal.java:287)
at
org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(
BlockReaderLocal.java:171)
at org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(
DFSClient.java:358)
at org.apache.hadoop.hdfs.DFSClient.access$800(DFSClient.java:74)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.
blockSeekTo(DFSClient.java:2073)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(
DFSClient.java:2224)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$
WALReader.<init>(SequenceFileLogReader.java:55)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(
SequenceFileLogReader.java:178)
at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:734)
at
org.apache.hadoop.hbase.replication.regionserver.
ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.
java:69)
at
org.apache.hadoop.hbase.replication.regionserver.
ReplicationSource.openReader(ReplicationSource.java:574)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(
ReplicationSource.java:364)
2016-08-03 10:46:21,724 INFO org.apache.hadoop.hdfs.DFSClient: Try reading
via the datanode on /192.168.7.139:50010
yangming860101
2016-08-16 04:18:37 UTC
Permalink
dfs.domain.socket.path not configured.There is no similar fail when
ShortCircuitLocalReads ,just only for log files(replication).

hdfs verion:1.2.1
<property>
<name>dfs.block.local-path-access.user</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>

hbase version:0.94.20
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>hbase.replication</name>
<value>true</value>
</property>



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/BUG-ShortCircuitLocalReads-Failed-when-enabled-replication-tp4081733p4081818.html
Sent from the HBase User mailing list archive at Nabble.com.
yangming860101
2016-08-16 04:20:27 UTC
Permalink
HBase log:
2016-08-12 14:25:56,738 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
log for replication dn7%2C60020%2C1470216726863.1470983077165 at 28175754
2016-08-12 14:25:56,740 WARN org.apache.hadoop.hdfs.DFSClient:
BlockReaderLocal requested with incorrect offset: Offset 0 and length
28186400 don't match block blk_-4614053595081055029_54130616 ( blockLen
28175754 )
2016-08-12 14:25:56,740 WARN org.apache.hadoop.hdfs.DFSClient:
BlockReaderLocal: Removing blk_-4614053595081055029_54130616 from cache
because local file
/sdf/hdfs/dfs/data/blocksBeingWritten/blk_-4614053595081055029 could not be
opened.
2016-08-12 14:25:56,740 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
read block blk_-4614053595081055029_54130616 on local
machinejava.io.IOException: Offset 0 and length 28186400 don't match block
blk_-4614053595081055029_54130616 ( blockLen 28175754 )
at org.apache.hadoop.hdfs.BlockReaderLocal.<init>(BlockReaderLocal.java:287)
at
org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:171)
at org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:358)
at org.apache.hadoop.hdfs.DFSClient.access$800(DFSClient.java:74)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2073)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2224)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:178)
at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:734)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:574)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:364)
2016-08-12 14:25:56,740 INFO org.apache.hadoop.hdfs.DFSClient: Try reading
via the datanode on /192.168.7.139:50010

HDFS log(no error log):
2016-08-12 14:25:49,902 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.7.139:50010, dest: /192.168.7.139:55856, bytes: 66048, op:
HDFS_READ, cliID: DFSClient_hb_rs_dn7,60020,1470216726863, offset: 0, srvID:
DS-1014379950-192.168.7.139-50010-1416802300444, blockid:
blk_-4614053595081055029_54130616, duration: 252426
2016-08-12 14:25:49,964 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.7.139:50010, dest: /192.168.7.139:55858, bytes: 198144, op:
HDFS_READ, cliID: DFSClient_hb_rs_dn7,60020,1470216726863, offset: 0, srvID:
DS-1014379950-192.168.7.139-50010-1416802300444, blockid:
blk_-4614053595081055029_54130616, duration: 329364



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/BUG-ShortCircuitLocalReads-Failed-when-enabled-replication-tp4081733p4081819.html
Sent from the HBase User mailing list archive at Nabble.com.
yangming860101
2016-08-16 04:30:16 UTC
Permalink
online cluster configuration:
hdfs-site.xml
<http://apache-hbase.679495.n3.nabble.com/file/n4081821/hdfs-site.xml>
hbase-site.xml
<http://apache-hbase.679495.n3.nabble.com/file/n4081821/hbase-site.xml>



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/BUG-ShortCircuitLocalReads-Failed-when-enabled-replication-tp4081733p4081821.html
Sent from the HBase User mailing list archive at Nabble.com.
yangming860101
2016-08-16 04:32:09 UTC
Permalink
online cluster configuration:
hdfs-site.xml
<http://apache-hbase.679495.n3.nabble.com/file/n4081822/hdfs-site.xml>
hbase-site.xml
<http://apache-hbase.679495.n3.nabble.com/file/n4081822/hbase-site.xml>



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/BUG-ShortCircuitLocalReads-Failed-when-enabled-replication-tp4081733p4081822.html
Sent from the HBase User mailing list archive at Nabble.com.
yangming860101
2016-08-22 07:31:27 UTC
Permalink
Can anybody test it again,to verify whether there is a problem.
Thanks!



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/BUG-ShortCircuitLocalReads-Failed-when-enabled-replication-tp4081733p4081968.html
Sent from the HBase User mailing list archive at Nabble.com.

Loading...