Current status is:

I moved the algorithm first so that input was directly into fixed dimension matrices rather than going through a graph first. But the matrices need more than 60,000 x 60,000 dimension to code connections between all words in a text. (This is with naive coding. There should be dimension savings from moving words to distributed representation.)

I moved the algorithm to sparse matrices. These are a bit awkward to deal with. But the main problem with them is speed. And with larger texts the matrices may not actually be that sparse. Or perhaps a different format sparse matrix is needed. Anyway, I still seem to be running out of memory.
Possible next steps might include:

Simply using a machine with enough RAM. There will be a ceiling on the dimensions even with naive word coding, because languages only have so many words. Possibly the way to do this is to move testing to a flexible hardware environment like a cloud computing utility. Someone with experience in this area might help here.

Less naive word encoding. Instead of having a dimension for each word, I might code words with a distribution of bits across several dimensions. That would closer to the cognitive model too. Should not be too hard to do. But it would increase the parallelism of the algorithm, and that might have exponential speed consequences in the short term.

Explore where the symmetries of the connection matrix lie and try a different sparse matrix format: column sparse, row sparse, etc. Currently using “dictionary of keys” scipy.dok_matrix. Someone with experience using sparse matrices may have ideas.