“java.io.IOException: No locks available” error when using mvn

Tried to use mvn for compiling a Java library but got error:

[WARNING] Failed to read tracking file /adminhome/yandong/.m2/repository/org/apache/maven/plugins/maven-resources-plugin/2.6/_remote.repositories
java.io.IOException: No locks available
at sun.nio.ch.FileDispatcherImpl.lock0(Native Method)
at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:90)
at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1021)

 

turned out it’s about NFS lock service since the directory is mounted through network. The Linux distribution at work is customized and there’s no way to install the service as some Google search suggested. Solution: point mvn’s repository directory to some local space and the problem is gone. To be specific, in file apache-maven-${version}/conf/settings.xml: uncomment and update line:

<localRepository>some_local_directory</localRepository>

and re-run mvn.

Sorting in Linux

Background of the story: recently I needed to run random walk on search click graph. The idea is similar to pagerank but the algorithm works on a bipartite graph: on the left-hand side it’s all the queries, and on the other it’s the URLs. Two sides are connected via user clicks and the edges can be weighted or not weighted.

The problem: engineers are not very capable here. I was given data of (cookie, query, URL-clicked) and I had to aggregate the data myself (basically discard cookies) which is 5000 50M files.

My attempt: apparently it’s a practical application of merge-sort the only way I can think of that works on multiple large files. So I first output every 50 50M partitions into one 2.5G file, and I generated 50 such files. Then I brilliantly sort each 2.5 file locally using ‘sort -k2,3 -t\tab’. Done overnight. Awesome. Then I wrote a small Python script to merge sort all of them. Idea is simple: use a priority-queue to manage all opened file handles, and intelligently choose the next file handle to read in next line. If the key (query, URL) is same, increment current count, otherwise, output and reset the key and counter. Pretty straightforward. Well turns out it didn’t give me the merged result. After some debugging, the root cause is linux sort and Python sort treat strings differently (at least the default comparing criteria).

E.g. first few lines from linux sort:

1) 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 http://www.westminsterkennelclub.org/history/bis/ad.html?refresh=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 5
2) 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 http://www.westminsterkennelclub.org/history/bis/ad.html?refresh=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 5
3) 00000000000000speed test jsperf.com/test-speed-for

first lines of Python sort:

1) ! arxi mytelevizia.com/pirveli-arxi.html 1
2) ! http://howtomakecandleseasy.com/freelesson.html http://www.misstutu.com/freelesson.html 1
3) ! http://www.roblox.com/marine-corps-base-hawaii-kaneohe-bay-place?id=132251748 http://www.roblox.com/usmc-marine-corps-air-station-kaneohe-bay-hawaii-place?id=129108403 1

Here I speculate Python first converts the characters to their ASCII values and follows a numeric sorting.

I don’t know how exactly linux sort works. In the file I saw “000” followed by “$0.00” followed by another “000”. It could be that certain chars are ignored. I’ll update this post when it’s more clear to me.

p.s. try this: make a file of following 3 lines:

000 000 0000 phone calls
$0 : 00000000, at : 416f0000, v0 : 416f0000, v1 : 42332e20
00000001

and run sort.