# robot-grasp-detection **Repository Path**: soldatjiang/robot-grasp-detection ## Basic Information - **Project Name**: robot-grasp-detection - **Description**: Detecting robot grasping positions with deep neural networks. The model is trained on Cornell Grasping Dataset. This is an implementation mainly based on the paper 'Real-Time Grasp Detection Using Convolutional Neural Networks' from Redmon and Angelova. - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 5 - **Forks**: 1 - **Created**: 2019-12-02 - **Last Updated**: 2022-12-02 ## Categories & Tags **Categories**: machine-learning **Tags**: None ## README # Detecting grasping positions with deep neural networks using RGB images _(The model is uploaded but you can train better yourself if you have the time and the machine or if you are learning Tensorflow/ML. Please bear in mind that you need to read and adapt to your needs some parts of the code. Feel free to open an issue if you need help. I will try to update README and comment the code.)_ This implementation is mainly based on the algorithm from Redmon and Angelova described in [arXiv:1412.3128v2](https://arxiv.org/abs/1412.3128). The method uses an RGB image to find a single grasp. A deep convolutional neural network is applied to an image of an ohject and as a result one gets the coordinates, dimensions, and orientation of one possible grasp. The images used to train the network are from [Cornell Grasping Dataset](http://pr.cs.cornell.edu/grasping/rect_data/data.php). ### Problem description Having in mind a parallel plate griper before it closes, a simple and natural way of picturing the grasping position in an image would be a rectangle (see figure 1). One way representing it uniquely is as g = {x, y, \theta, h, w} where (x,y) is the center of the rectangle, \theta is the orientation of the rectangle to the horizontal axis of the image, _h_ and _w_ are the dimensions (height and width) of the rectangle. ![alt text](./figures/grasp_rep.png) The sole purpose of this small library is to train a network that given a RGB image is able (with some accuracy) to predict a possible grasp _g_. ## How to train from scratch The procedure follows these steps: - convert ImageNet in TFRecord format - train the model on ImageNet - convert the grasping dataset in TFRecords - train on the grasping dataset using the pretrained weights ### Preparing Imagenet Before running the script you will need to download and convert the ImageNet data to native TFRecord format. Check this [link](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) from the Inception model from Google. I found the whole Inception model in Github very useful. ### Training on Imagenet Running `imagenet_classifier.py` will do the trick. But first change the default dataset directory (mine lies in `/root/imagenet-data`) Check also in the end of the file the options that you can use, for example: ./imagenet_classifier.py --batch_size=128 --model_path=./models/imagenet/m1/m1.ckpt --train_or_validation=train Running on a GTX 980 and a very^2 good Xeon it needs around two days (I didn't time it). Check in the begining if the model is saving/restoring the weights. ### Preparing Cornell Grasping Dataset After downloading and decompressing run `build_cgd_dataset.py`. Make sure to adapt to your needs the python file, for example - point `dataset` in the right place - in `filename[:49]` adapt the number 49 (you can contribute, or I will program it better someday) ### Train on grasping dataset Just run `grasp_det.py` for the training, or give some arguments as in training Imagenet. These are just around 1000 examples (images with grasps) and can be trained very fast. Careful not to overfit.