时间:10月30日(周四)下午 13:30
地点:吕大龙楼11层1110
报告一 Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected
报告人:Diego Cerretti
内容:
This study aims to enlarge our current knowledge on the application of brain-inspired network science principles for training artificial neural networks (ANNs) with sparse connectivity. Dynamic sparse training (DST) emulates the synaptic turnover of real brain networks, reducing the computational demands of training and inference in ANNs. However, existing DST methods face difficulties in maintaining peak performance at high connectivity sparsity levels. The Cannistraci-Hebb training (CHT) is a brain-inspired method that is used in DST for growing synaptic connectivity in sparse neural networks. CHT leverages a gradient-free, topology-driven link regrowth mechanism, which has been shown to achieve ultra-sparse (1% connectivity or lower) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: (i) its time complexity is O(Nd3)- N node network size, d node degree - hence it can be efficiently applied only to ultra-sparse networks. (ii) it rigidly selects top link prediction scores, which is inappropriate for the early training epochs, when the network topology presents many unreliable connections. Here, we design the first brain-inspired network model - termed bipartite receptive field (BRF) - to initialize the connectivity of sparse artificial neural networks. Then, we propose a matrix multiplication GPU-friendly approximation of the CH link predictor, which reduces the computational complexity to O(N3), enabling a fast implementation of link prediction in large-scale models. Moreover, we introduce the Cannistraci-Hebb training soft rule (CHTs), which adopts a flexible strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. Additionally, we propose a sigmoid-based gradual density decay strategy, leading to an advanced framework referred to as CHTss. Empirical results show that BRF offers performance advantages over previous network science models. Using 1% of connections, CHTs outperforms fully connected networks in MLP architectures on visual classification tasks, compressing some networks to less than 30% of the nodes. Using 5% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks. Finally, with only 30% of the connections, both CHTs and CHTss achieve superior performance over other dynamic sparse training methods, and perform on par with—or even surpass—their fully connected counterparts in language modeling across various sparsity levels within the LLaMA model family.
报告二 A generalized logistic-logit function and its application to multi-layer perceptron and neuron segmentation
报告人:谷文祺
内容:
Logistic and logit functions play important roles in modern science, serving as foundational tools in various applications including artificial neural network (ANN). While there are functions that could produce distinct logistic and logit curves, no single, unified framework has been developed to generate both logistic and logit curves. We introduce a generalized logistic–logit function (CMG-GLLF) to fill this gap. CMG-GLLF provides four interpretable and trainable parameters that allow explicit control over: curve type and steepness, asymmetry, upper and lower limits of x- and y-axes. CMG-GLLF’s potential is explored in basic machine intelligence tasks. We propose a trainable input feature modulator (IFM) for multi-layer perceptron (MLP) that consists in learning the parameters of the CMG-GLLF for each input layer node during backpropagation, achieving MLP’s superior accuracy and faster learning speed in image classification. Furthermore, CMG-GLLF as data transformation enhances the accuracy of affinity-graph-based neuron segmentation. CMG-GLLF combines in a unique framework the ability of logistic and logit function to modulate signals or variables, covering a full spectrum of attenuation or amplification transformations. CMG-GLLF is flexible and trainable, has potential to advance machine learning models, and can inspire further applications in other data analysis challenges in different domains of science.