VLSI Architecture for Pipeline and Parallel Array based Matrix Multiplication using Deep Learning Technique

Dheeraj Kumar, Prof. Suresh S. Gawande

pdf

Keywords:

Deep Learning, Matrix Multiplication, Parallel Technique, Pipeline Technique

Dheeraj Kumar, Prof. Suresh S. Gawande

Abstract

Matrix multiplication is a fundamental operation in various scientific computations, image processing, and particularly in deep learning applications where it forms the backbone of convolutional and fully connected layers. The growing demand for real-time processing in artificial intelligence (AI) and machine learning systems necessitates highly efficient hardware implementations. This paper presents VLSI architecture for pipelined and parallel array-based matrix multiplication optimized for deep learning techniques. The proposed design leverages pipelining to enhance throughput and reduce latency, while parallel array structures ensure efficient handling of large-scale matrix operations with minimized computational delay. By adopting deep learning-driven optimization strategies, the architecture achieves improvements in area utilization, delay, and power efficiency compared to conventional multiplier-based designs. Simulation and synthesis results validate the effectiveness of the proposed approach, demonstrating its suitability for high-performance computing platforms, neural network accelerators, and embedded AI systems.

In this paper, we have proposed MM using deep learning approach. This design reduced hardware complexity, delay and input/output data format to match different application needs. The PPI-MO based MM is design Xilinx software and simulated number of slice, look up table and delay.

How to Cite

Dheeraj Kumar, Prof. Suresh S. Gawande. (2025). VLSI Architecture for Pipeline and Parallel Array based Matrix Multiplication using Deep Learning Technique. International Journal of Advanced Research and Multidisciplinary Trends (IJARMT), 2(3), 424–435. Retrieved from https://ijarmt.com/index.php/j/article/view/444

Issue

Vol. 2 No. 3 (2025): July - Sep 2025

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

References

E. S, S. S. A, S. D and N. M, "VLSI Implementation of Pipelined PE Systolic Array-Based 3x3 Matrix Multiplication for Deep Neural Network Accelerator," 2025 Fourth International Conference on Smart Technologies, Communication and Robotics (STCR), Sathyamangalam, India, 2025, pp. 1-4.

Alaejos, G., Castelló, A., Martínez, H., Alonso-Jordá, P., Igual, F.D., Quintana-Ortí, E.S.: Micro-kernels for portable and efficient matrix multiplication in deep learning. J. Supercomput. 79(7), 8124–8147, 2023.

Kim, D., Kim, J.: Analysis of several sparse formats for matrices used in sparse-matrix dense-matrix multiplication for machine learning on GPUs. In: 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), pp. 629–631. IEEE, 2022

Kim, S., Kim, J., Kim, N., Kang, M., Seo, J.: Improving inference time of deep learning model with partial skip of ReLU-fused matrix multiplication operations. In: 2022 International Conference on Electronics, Information, and Communication (ICEIC), pp. 1–4. IEEE, 2022.

Rizwan, M., Jung, E., Park, Y., Choi, J., Kim, Y.: Optimization of matrix-matrix multiplication algorithm for matrix-panel multiplication on intel KNL. In: 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA), pp. 1–7. IEEE, 2022.

Chen Yang, Siwei Xiang, Jiaxing Wang, Liyan Liang, “A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units”, 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS), IEEE 2021.

Wei Mao, Kai Li, Xinang Xie, Shirui Zhao, He Li;Hao Yu, “A Reconfigurable Multiple-Precision Floating-Point Dot Product Unit for High-Performance Computing”, Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE 2021.

Di Yan, Wei-Xing Wang, Lei Zuo and Xiao-Wei Zhang, “Revisiting the Adjoint Matrix for FPGA Calculating the Triangular Matrix Inversion”, IEEE Transactions on Circuits and Systems II: Express Briefs, 2020.

X.-W. Zhang, L. Zuo, M. Li and J.-X. Guo, “High-throughput FPGA implementation of Matrix inversion for control systems,” Accepted by IEEE Transactions Ind. Electron., 2020.

S. Ross Thompson and James E. Stine, “A Novel Rounding Algorithm for a High Performance IEEE 754 Double-Precision Floating-Point Multiplier”, 38th International Conference on Computer Design (ICCD), IEEE 2020.

P.L. Lahari, M. Bharathi, Yasha Jyothi and M Shirur, “High Speed Floating Point Multiply Accumulate Unit using Offset Binary Coding”, 7th International Conference on Smart Structures and Systems (ICSSS), IEEE 2020.

C. Zhang, et al, “On the low-complexity, hardware-friendly tridiagonal matrix inversion for correlated massive MIMO systems,” IEEE Trans. Vehic. Tech., vol. 68, no. 7, pp. 6272-6285, Jul. 2019.

S. Venkatachalam E. Adams H. J. Lee and S.-B. Ko "Design and analysis of area and power efficient approximate booth multipliers" IEEE Trans. Comput. vol. 68 no. 11 pp. 1697-1703 Nov. 2019.

Y.-W. Xu, Y. Xi, J. Lan and T.-F. Jiang, “An improved predictive controller on the FPGA by hardware matrix inversion,” IEEE Trans. Ind. Electron., vol. 65, no. 9, pp. 7395–7405, Sep. 2018.

Lakshmi kiran Mukkara and K.Venkata Ramanaiah, “A Simple Novel Floating Point Matrix Multiplier VLSI Architecture for Digital Image Compression Applications”, 2nd International Conference on Inventive Communication and Computational Technologies (ICICCT 2018) IEEE.

D. Kalaiyarasi and M. Saraswathi, “Design of an Efficient High Speed Radix-4 Booth Multiplier for both Signed and Unsigned Numbers”, 4th International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), IEEE 2018.

Article Sidebar

Main Article Content

Abstract

Article Details

References

Similar Articles