Shaojiang Wang's research while affiliated with Chinese Academy of Sciences and other places

What is this page?


This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Publications (7)


The SEGE Algorithm
The overview of SEGE. The structure entropy measures the higher-order structure of graphs and is guided by the unsupervised account features on transaction graphs, thereby establishing money laundering groups
An illustration of SE and SEGE algorithm. Let S be the predetermined set of suspicious nodes. The algorithm begins at state a. In step b, all neighboring nodes of S are identified. In step c, the node with the maximum structural entropy variation within the neighbor set is selected and added to S, and the set of neighboring vertices is updated. Similarly, in step d, the node with the maximum structural entropy variation within the neighbor set is involved into S with the list of neighboring vertices updated. The final vertex subset of suspicious accounts is explored out gradually in this recursive fashion
Visualization of MLI results on cnBank by using SE, SEGE (a single trial) and three representative baselines node2vec [42], DGI [47] and NetSMF [45]. Compared with the ground truth, blue nodes denotes the correctly identified (true positive) accounts, while the red nodes are false negative and false positive accounts
Scores and community sizes achieved by SEGE with different embedding dimensions. a,b: SEGE with cnBank, c,d: SEGE with MahData. The ground-truth size of cnBank is 2690, and that of MahData is 806

+5

Structural entropy minimization combining graph representation for money laundering identification
  • Article
  • Full-text available

April 2024

·

36 Reads

International Journal of Machine Learning and Cybernetics

Shaojiang Wang

·

Pengcheng Wang

·

Bin Wu

·

[...]

·

Yicheng Pan

Money laundering identification (MLI) is a challenging task for financial AI research and application due to its massive transaction volume, label sparseness, and label bias. Most of the existing MLI methods focus on individual-level abnormal behavior while neglecting the community factor that money laundering is a collaborative group crime. Furthermore, the massive volume of transactions and the issue of label shifting also impede the application of supervised or semi-supervised models. To this end, this paper proposes an efficient community-oriented algorithm, namely SEGE, to identify money laundering based on structural entropy minimization (SEM) with graph embedding in an unsupervised approach. Experiments on both a private real-world money laundering network and a public synthetic dataset show that our SEGE algorithm derives prominent performance and outperforms the parameterized learning-based graph representation methods. Moreover, we find that there are pervasive sub-communities in the real-world money laundering network. Based on our local algorithm, we propose a real combat strategy against the money laundering group, in which when we have several scattered suspicious accounts in the transaction network, we are able to retrieve the whole money laundering group by the union of sub-communities with both high precision and high recall rates.

Download
Share


Integrative Analysis of Hepatic Metabolomic and Transcriptomic Data Reveals Potential Mechanism of Nonalcoholic Steatohepatitis in High‐fat Diet–fed Mice

October 2020

·

70 Reads

·

7 Citations

Journal of Diabetes

Journal of Diabetes

Background: Due to the complex pathogenesis, the molecular mechanism of nonalcoholic steatohepatitis (NASH) remains unclear. In this study, we aimed to reveal the comprehensive metabolic and signaling pathways in the occurrence of NASH. Methods: C57BL/6 mice were treated with high-fat diet for 4 months to mimic the NASH phenotype. After the treatment, the physiochemical parameters were evaluated, and the liver tissues were prepared for untargeted metabolomic analysis with ultraperformance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry. Then, three relevant Gene Expression Omnibus (GEO) datasets were selected for integrative analysis of differentiated messenger RNA and metabolites. Results: The levels of phosphatidylethanolamine (PE) (16:1(9Z)/20:4(5Z,8Z,11Z,14Z)), oleic acid, and sphingomyelin (SM) (d18:0/12:0) were significantly increased, and the content of adenosine was severely reduced in NASH mice. The integrated interpretation of transcriptomic and metabolomic data indicated that the glycerophospholipid metabolism and necroptosis signaling were evidently affected in the development of NASH. The high level of SM (d18:0/12:0) may be related to the expression of acid sphingomyelinase (ASMase), and the elevated arachidonic acid was coordinated with the upregulation of cytosol phospholipase A2 (cPLA2) in the necroptosis pathway. Conclusions: In summary, the inflammatory response, necroptosis, and glycerophospholipid may serve as potential targets for mechanistic exploration and clinical practice in the treatment of NASH.


Flow-Based Clustering on Directed Graphs: A Structural Entropy Minimization Approach

December 2019

·

219 Reads

·

5 Citations

IEEE Access

In this study, we focus on flow-based clustering on directed graphs and propose a localized algorithm for this problem. Flow-based clustering in networks requires a set of closely related vertices where the flow amount getting into it is larger than that going out of it. It is able to formulate a variety of practical problems, such as fund-raising set detection in financial networks and influential document clustering in citation networks, etc. Methodologically, we propose the new concept of two-dimensional structural entropy on directed graphs, and based on this, a local structural entropy minimization algorithm for detecting the flow-based community structure of networks is designed. We adopt our algorithm for the problem of fund-raising set detection in financial networks, in which vertices represent accounts, edges represent transactions between two accounts, and weights represent money amounts of transactions. In our experiments, the local two-dimensional structure entropy minimization algorithm is devoted to find a fund-raising community which involves a given input account. We conduct experiments on both synthetic and real fund-raising datasets. The experimental results demonstrate that, given a fixed account, our algorithm is able to efficiently locate a fund-raising community (if any) for which the fund flowing into the community is much higher than that flowing out, and the transactions within the community are relatively denser (fund amount based) than that of inter-community. For a synthetic ground-truth fund-raising community, we adjust the parameters to change its fund-raising tendency. The results for the synthetic datasets show that our algorithm obtains higher precision and recall rates as this tendency gets stronger with each single factor varing. For a real fund-raising community embedded in a simulated capital flow network, our algorithm also find it with high precision and recall rates. The experiments for both scenarios verify the effectiveness of our algorithm.


Constrained maximum weighted bipartite matching: a novel approach to radio broadcast scheduling

July 2019

·

18 Reads

·

6 Citations

Science China Information Sciences

Given a set of radio broadcast programs, the radio broadcast scheduling problem is to allocate a set of devices to transmit the programs to achieve the optimal sound quality. In this article, we propose a complete algorithm to solve the problem, which is based on a branch-and-bound (BnB) algorithm. We formulate the problem with a new model, called constrained maximum weighted bipartite matching (CMBM), i.e., the maximum matching problem on a weighted bipartite graph with constraints. For the reduced matching problem, we propose a novel BnB algorithm by introducing three new strategies, including the highest quality first, the least conflict first and the more edge first. We also establish an upper bound estimating function for pruning the search space of the algorithm. The experimental results show that our new algorithm can quickly find the optimal solution for the radio broadcast scheduling problem at small scales, and has higher scalability for the problems at large scales than the existing complete algorithm.


Rectangle Transformation Problem

July 2019

·

84 Reads

·

2 Citations

Algorithmica

In this paper, we propose the rectangle transformation problem (RTP) and its variants. RTP asks for a transformation by a rectangle partition between two rectangles of the same area. We are interested in the minimum RTP which requires to minimize the partition size. We mainly focus on the strict rectangle transformation problem (SRTP) in which rotation is not allowed in transforming. We show that SRTP has no finite solution if the ratio of the two parallel side lengths of input rectangles is irrational. So we turn to its complement denoted by SIRTP, in which case all side lengths can be assumed integral. We give a polynomial time algorithm ALGSIRTP which gives a solution at most $q/p+O(\sqrt{p})$ to SIRTP$(p,q)$ ($q\geq p$), where $p$ and $q$ are two integer side lengths of input rectangles $p\times q$ and $q\times p$, and so ALGSIRTP is a $O(\sqrt{p})$-approximation algorithm for minimum SIRTP$(p,q)$. On the other hand, we show that there is not constant solution to SIRTP$(p,q)$ for all integers $p$ and $q$ ($q>p$) even though the ratio $q/p$ is within any constant range. We also raise a series of open questions for the research along this line.


Citations (4)


... Selected zebrafish were placed in tanks of the same size (50 × 20 × 10 cm) with a capacity of 10 L. The stocking density of zebrafishes is 2.5 Tail/L. Zebrafishes were exposed to OTC for 21 days in accordance with the previous study (Wang et al. 2021b). Uneaten food and fecal material were removed from the tanks each day by carefully cleaning the bottom of each tank using suction to avoid microbial growth and water turbidity. ...

Reference:

Transcriptomic Analysis of Hepatotoxicology of Adult Zebrafish (Danio rerio) Exposed to Environmentally Relevant Oxytetracycline
Integrative Analysis of Hepatic Metabolomic and Transcriptomic Data Reveals Potential Mechanism of Nonalcoholic Steatohepatitis in High‐fat Diet–fed Mice
Journal of Diabetes

Journal of Diabetes

... Approximate entropy (ApEn) can quantitatively describe the complexity of time series by calculating the marginal probability distribution [35]. It has been widely used to measure the randomness and irregularity of sequences. ...

Flow-Based Clustering on Directed Graphs: A Structural Entropy Minimization Approach

IEEE Access

... The problem is how to make the Program interesting and get listeners. This is a very important aspect of the concept of "radio programming" and is equivalent to the development of the format (Wang, 2019). For example, a successful commercial radio broadcasting station will attract and reach a specific group of listeners. ...

Constrained maximum weighted bipartite matching: a novel approach to radio broadcast scheduling
  • Citing Article
  • July 2019

Science China Information Sciences

... On the one hand, there are many testing environments that adopt the principles of CIT, for example, even sequence testing [104]- [106], grammar-based testing [107], security testing [108]- [110], scenario-based testing [111], solution testing [112], and certificate testing [113]. On the other hand, many system applications have been tested by CIT, including MP3 applications [114], [115], concurrent programs [116], [117], cloud environments [118], mobile applications [119]- [121], software products lines [19], [20], big data applications [122], industrial settings [123], web applications [124]- [126], and cyber-physical systems [127]. ...

Combinatorial Testing on MP3 for Audio Players
  • Citing Conference Paper
  • March 2017