best practice for graph-like entities on appengine ndb

gru Source

I'm designing a g+ application for a big international brand. the entities I need to create are pretty much in form of a graph, hence a lot of many-to-many relations (arcs) connecting nodes that can be traversed in both directions. I'm reading all the readable docs online, but I haven't found anything so far specific to ndb design best practices and guidelines. unfortunately I am under nda, and cannot reveal details of the app, but it can match almost one to one the context of scientific conferences with proceedings, authors, papers and topics.

below the list of entities envisioned so far (with context shifted to match the topics mentioned):

  • organization (e.g. acm)
  • conference (e.g. acm multimedia)
  • conference issue (e.g. acm multimedia 13)
  • conference track (e.g. nosql, machine learning, computer vision, etc.)
  • author (e.g. myself)
  • paper (e.g. "designing graph like db for ndb")

as you can see, I can visit and traverse the graph through any direction (or facet, from a frontend point of view):

  • author with co-authors
  • author to conference tracks
  • conference tracks to papers
  • ...

and so on, you fill the list.

I want to make it straight and solid because it will launch with a lot of p.r. and will need to scale consistently overtime, both in content and number of users. I would like to code it from scratch hence designing my own models, restful api to read/write this data, avoiding non-rel django and keeping the presentation layer to a minimum template mechanism. I need to check with the company where I work, but we might be able to release part of the code with a decent open source license (ideally, a restful service for ndb models).

if anyone could point me towards the right direction, that would be awesome.

thanks! thomas

[edit: corrected typo related to many-to-many relations]



answered 5 years ago dragonx #1

There's two ways to implement one-to-many relationships in App Engine.

  1. Inside entity A, store a list of keys to entities B1, B2, B3. In th old DB, you'd use a ListProperty of db.Key. In ndb you'd use a KeyProperty with repeated = True.

  2. Inside entity B1, B2, B3, store a KeyProperty to entity A.

If you use 1:

  • When you have Entity A, you can fetch B1, B2, B3 by id. This can be potentially more consistent than the results of a query.
  • It could be slightly less expensive since you save 1 read operation over a query (assuming you don't count the cost of fetching entity A). Writing B instances is slightly cheaper since it's one less index to update.
  • You're limited in the number of B instances you can store by the maximum entity size and number of indexed properties on A. This makes sense for things like conference tracks since there's generally a limited number of tracks that doesn't go into the thousands.
  • If you need to sort the order of B1, B2, B3 arbitrarily, it's easier to store them in order in a list than to sort them using some sorted indexed property.

If you use 2:

  • You only need entity A's Key in order to query for B1, B2, B3. You don't actually need to fetch entity A to get the list.
  • You can have pretty much unlimited # of B entities.

answered 5 years ago gru #2

after thorough research, this what found:

  • there's not a single design pattern to follow, this of course depends on the specific application and data modeling (fair enough)
  • actions should be taken to avoid hitting the size limits listed at the bottom of this page, mainly for single entity size (1mb), transaction size (10mb) and index limits
  • avoid where possible entity normalization (e.g. having entities used only to create set of arcs) although it seems used by googleplus devs in their demo app for a simple social graph

any other more detailed answer is welcomed

comments powered by Disqus