用opencv提取PDF或Image的图纸信息并写入DWG

本片博文是个开端，讲述使用opencv从pdf或者图片中提取图纸信息并写入到dwg文件中的做法，由于从pdf中提取信息会先将pdf转换为opencv的Mat图片格式，所以这篇博文重点讲述从pdf中提取图纸信息。

1、从带有图纸的pdf中提取信息，首先需要能解析pdf文件并输出可以用opencv做识别的格式。

上图是一个含有图纸信息的pdf文件，博主选用了poppler开源库来解析pdf文件，poppler库可以解析pdf文件并渲染得到image数据。

cv::Mat pdfToMat(const std::string& pdfPath, int pageIdx = 0, int dpi = 300) {	auto doc = poppler::document::load_from_file(pdfPath);	auto page = std::unique_ptr<poppler::page>(		doc->create_page(pageIdx)	);	poppler::page_renderer pr;	pr.set_image_format(poppler::image::format_gray8);	poppler::image img = pr.render_page(page.get(), dpi, dpi);	cv::Mat mat(		img.height(), 		img.width(), 		CV_8UC1, 		(void*)img.data(), 		img.bytes_per_row());return mat.clone();}

这里要注意渲染pdf到图时要设置格式为灰度(poppler::image::format_gray8)，按RGB的默认格式渲染有点问题。

2、使用opencv对得到的cv::Mat图，进行边缘检测、轮廓查找、轮廓点集数据多边形拟合等操作得到图纸中的线等信息。

a> 边缘检测

	cv::Mat edge;	cv::Canny(pdfMat, edge, 250, 300);

b> 轮廓查找

	std::vector<std::vector<cv::Point>> contours;	std::vector<cv::Vec4i> hierarchy;	cv::findContours(		edge, 		contours, 		hierarchy, 		cv::RETR_LIST, 		cv::CHAIN_APPROX_SIMPLE);

c> 轮廓点数据拟合

	std::vector<std::vector<cv::Point>> approxPolys;for (auto& contour : contours)	{		std::vector<cv::Point> approx;		cv::approxPolyDP(contour, approx, 2.0, false);		approxPolys.push_back(approx);	}

3、将拟合得到的结果输出到DWG文件中

拟合的结果数据单位是像素，输出到dwg文件中可以设置个物理尺寸，比如这里设置X向的宽度为1000mm，Y向的宽度根据cv::Mat的宽高比计算得到

#define X_RANGE 1000double whScale = ((float)edge.rows / edge.cols);double yRange = X_RANGE * whScale;

然后循环遍历拟合数据，调用libredwg的接口写数据到dwg文件中

创建一个文档用于写数据

 //----------------------------    Dwg_Data* dwg = dwg_new_Document(R_2000, 0, 3); //创建dwg文档    auto mspace = dwg_model_space_object(dwg);    auto hdr = mspace->tio.object->tio.BLOCK_HEADER; //-----------------------------

往文档中写数据的接口

 void DWGWriter::AddLine(cv::Point2f& pt1, cv::Point2f& pt2){    dwg_point_3d p1 = { pt1.x, pt1.y, 0.0 };    dwg_point_3d p2 = { pt2.x, pt2.y, 0.0 };    dwg_add_LINE(hdr, &p1, &p2);}

遍历拟合数据调用写数据接口

	DWGWriter writer;for (auto& approxPoly : approxPolys)	{for (int32_t index = 0; index < approxPoly.size() - 1; ++index)		{auto& curStartPt = approxPoly[index];auto& curEndPt = approxPoly[index + 1];			cv::Point2f startPoint;			cv::Point2f endPoint;			startPoint.x = ((float)curStartPt.x / edge.cols) * X_RANGE;			startPoint.y = (1.0 - (float)curStartPt.y / edge.rows) * yRange;			endPoint.x = ((float)curEndPt.x / edge.cols) * X_RANGE;			endPoint.y = (1.0 - (float)curEndPt.y / edge.rows) * yRange;			writer.AddLine(startPoint, endPoint);		}	}

最后调用dwg_write_file(dwgFile.c_str(), dwg);完成dwg写入。

4、开篇pdf文件提取数据写入到dwg文件后的效果如下

后续说明：

本文是个识别提取图纸信息的开端，内部识别的由很多碎线条以及文字的识别等后续问题需要处理。