Trading Systems Literature Review Project

01

I started by turning the review into a research system

My first decision was methodological: I did not want the project to end as a static spreadsheet. I wanted a workflow where I could load the Scopus exports, tune the term filters, rerun the analysis, inspect each output, and download the evidence. The Shiny dashboard gave me that control surface, so the case study became something I could demonstrate and reproduce rather than only describe.

Shiny dashboard overview for the trading systems literature review — The dashboard is where I controlled the pipeline, checked the corpus counts, and exported the research artifacts.

02

I made the screening process defensible

The PRISMA-style flow became the backbone of the case. I started with 588 Scopus records from 20 search queries, removed 77 duplicates using a normalized title key, and retained 511 studies with non-empty abstracts. I also separated the 491 papers that passed the keyword relevance logic from the 20 borderline records that needed human review, because the uncertain cases are part of the method rather than a nuisance to hide.

PRISMA 2020 flow diagram for the trading systems literature review corpus — The PRISMA flow records the chain from 588 raw Scopus records to 511 included studies, including the manual-review queue.

03

I separated the corpus into auditable data products

I built two linked datasets. Data A is the cleaned master table with bibliographic fields, search provenance, PaperID, and topic flags. Data B is the abstract-only text-mining input, reduced to PaperID and abstract so every downstream token, map, and correlation can be traced back to a source record. The run log closes the loop by recording the exact counts and parameter values from the pipeline run.

Data A master corpus table with provenance and topic fields — Data A preserves the bibliographic and provenance fields I needed for auditability.

Data B abstract text-mining table — Data B gives the text-mining steps a controlled PaperID plus abstract input.

Run log and metadata table from the Shiny dashboard — The run log records counts, thresholds, token totals, and pipeline availability checks.

04

I used text mining to identify the vocabulary of the field

Once the corpus was clean, I used tokenization, stop-word removal, noise-word filtering, and frequency thresholds to focus on informative terms. The top terms show the working language of this review: requirements, development, online systems, blockchain, networks, commerce, decision-making, risk, price, and markets. That vocabulary told me the corpus was not only about trading finance; it was also about systems, platforms, user-facing technology, and decision support.

Top 20 most frequent terms after text-mining filters — The top-term chart shows the filtered vocabulary that shaped the later heatmap and MDS maps.

05

I tested whether the vocabulary had structure

Frequency alone is not enough for a literature review, so I checked whether terms and papers actually formed interpretable relationships. The correlation heatmap shows how the top terms co-occur across abstracts. The document map projects 511 papers with TF-IDF and MDS, revealing a dense core plus visible outliers. The term map gives me a cleaner concept-level view, where topics such as blockchain, requirements, networks, risk, investment, prediction, commerce, and social context occupy different regions of the map.

Correlation heatmap for the top 20 terms — The heatmap helped me see which terms belonged together and which terms marked separate streams.

Document map using two-dimensional MDS of TF-IDF distances — The document map exposed a dominant paper cluster and a small set of records worth closer inspection.

Term map using two-dimensional MDS of correlation distances — The term map summarizes the corpus at the concept level rather than the document level.

06

I added bibliometrix to place the corpus in the academic field

After the text-mining layer, I used bibliometrix to ask a different question: where does this research live academically? The annual production curve shows the field accelerating sharply toward 2024 and 2025. Bradford analysis shows a dispersed publication base with core sources such as Lecture Notes in Computer Science, Lecture Notes in Networks and Systems, ACM proceedings, IEEE Access, and Communications in Computer and Information Science. Keyword growth shows commerce, electronic trading, financial markets, investments, decision-making, forecasting, and blockchain becoming increasingly visible.

Annual scientific production chart for the trading systems corpus — The publication curve shows the review topic becoming much more active in the most recent years.

Bradford law plot for source concentration — Bradford analysis helped me describe whether the field is concentrated in a small source core or spread across many venues.

Top keywords cumulative growth chart — Keyword growth shows how commerce, electronic trading, financial markets, investments, decision-making, forecasting, and blockchain developed over time.

07

I moved from charts to thematic interpretation

The thematic analysis gave me language for the final interpretation. Commerce appeared as the strongest motor theme, while blockchain and finance had high development density. Electronic commerce and decision-making sat closer to basic or transversal themes, and the conceptual structure map separated a broad trading-systems/finance/user-interface region from a smaller AI and language-processing region. This helped me write the case as a map of research streams, not just a list of outputs.

Bibliometrix thematic map for the trading systems corpus — The thematic map positions research streams by centrality and density, turning keyword clusters into interpretable themes.

Conceptual structure map generated with multiple correspondence analysis — The conceptual structure map shows how terms such as stock prediction, time series, user interfaces, data mining, and language processing separate into broader regions.

08

I used VOSviewer as a second validation layer

I did not want the whole interpretation to depend on one R pipeline, so I exported VOSviewer-ready maps and checked the network structure there as well. The abstract-term co-occurrence map confirmed three major areas: a financial prediction and stock-market cluster, a platform and e-commerce cluster, and a blockchain/energy-trading cluster. Seeing similar structure in a different tool made the final interpretation stronger.

VOSviewer abstract-term co-occurrence network — The VOSviewer network gave me an external check on the clusters detected by the R text-mining and bibliometrix layers.

09

What this project says about how I work

This case became personal because I was the person holding the whole chain together: the search logic, the cleaning rules, the PaperID system, the Shiny dashboard, the figures, the PRISMA flow, and the academic interpretation. I learned that a strong literature-review project is not only about finding papers. It is about making every decision visible enough that another researcher can follow the path from raw search results to final insight.

Trading Systems Literature Review

The Challenge

The Approach

How it works

I started by turning the review into a research system

I made the screening process defensible

I separated the corpus into auditable data products

I used text mining to identify the vocabulary of the field

I tested whether the vocabulary had structure

I added bibliometrix to place the corpus in the academic field

I moved from charts to thematic interpretation

I used VOSviewer as a second validation layer

What this project says about how I work

Results

Key features

Tech stack

Interested in similar work?

Portfolio menu

Trading Systems Literature Review

The Challenge

The Approach

How it works

I started by turning the review into a research system

I made the screening process defensible

I separated the corpus into auditable data products

I used text mining to identify the vocabulary of the field

I tested whether the vocabulary had structure

I added bibliometrix to place the corpus in the academic field

I moved from charts to thematic interpretation

I used VOSviewer as a second validation layer

What this project says about how I work

Results

Key features

Tech stack

Interested in similar work?