Topic:   Suppose you have 30,000+ names in a database....   (Read 39357 times)


0 Members and 4 Guests are viewing this topic.

Gan


  • Administrator

  • ^ This guy is amazing.

  • *****


  • Posts: 4411
Suppose you have 30,000+ names in a database....
« on: December 21, 2011, 05:05:55 PM »
And you have to load them and display them when a person searches through the database...
You have to make it load and search as fast as possible, what method would you use?

The database looks like this:
John Edward III
Prince Henry The Awesome
Gan Is Made Of Epic Sauce
MyParents .Hate Me
Blablabla GaGa RaRa Oh LaLa

And there's 30,000+ names, each name on a separate line.

Search specifications:
Every word typed in the search box has to be partially matched with every word of every name.
If this is in the search box: "Jo Ed"
It will display this: "John Edward III"
If this is in the search box: "E"
It will display this:
"John Edward III
Gan Is Made Of Epic Sauce"

My first attempt at making this involved loading each name in an array and when searching, do string comparisons to every word in every name.
Loading the list of names was fast. Searching through them was as slow as me doing math homework.

My second attempt involved having 26 arrays, each for a letter of the alphabet. I would chop up the names and place each name keyword in an array that symbolized the first letter in the keyword.
Loading was fast, and searching was fairly decent.

My latest attempt involved me trying to put to use some of my knowledge from my college courses. I decided to chop all names into keywords, each keyword was linked to an array that connected to every name that contained that keyword, and I sorted all keywords into an array using a binary sort so I could quickly and efficiently search through by using a binary search algorithm.
Well, loading was slow(4 seconds), but searching wasn't too bad. Still a noticeable hiccup.

My idea, is to use the same method as attempt 3. Except I make a keyword not link to multiple names and have multiple of the same keywords except they each link to a unique name. Then I use binary sort to sort them, and binary search to pull them out. I have a feeling this might be faster in both loading and retrieving.

What methods would you guys use?
(BTW I'm tryin to save on ram and if searching is super fast, I could have loading done asynchronously in the background)

GMG Kurt


  • GMG-er

  • **


  • Posts: 682

  • Sorry for being such a noob
Re: Suppose you have 30,000+ names in a database....
« Reply #1 on: December 21, 2011, 07:34:18 PM »
well the first method wouldn't be reliable because they might only put in part of the name, not including the first letter. But I don't understand what you mean by key words, do you mean the word 'nick' have the keywords

nick
ick
ck
n
nic
ni
k
c
i
ic

and all other letter combinations? I think I'd do the third method, but group the keywords into different arrays depending on how many letters are in them, or more preferably a 2-D array, so you don't have to do pointer to arrays and all that mess.
Just your average Weekend Warrior.
Yes I know I have bad spelling, it's what makes me such a good programmer!

"Old art, weather magnificent or wretched, is always the raw material of new art. The artist's job, though, is to

Gan


  • Administrator

  • ^ This guy is amazing.

  • *****


  • Posts: 4411
Re: Suppose you have 30,000+ names in a database....
« Reply #2 on: December 21, 2011, 09:06:14 PM »
Nah, by keyword I mean:
Nick is a keyword of the name "Nick El Back"

And, nah, I don't need to search for stuff in the middle of the keyword, cause I don't expect people to start writing the last half of the name. Just the first half.

I actually had thought of using a 2D array. In this pattern:

  A             B          C
p  r          l     i      a   a
p  a         a     k     t    c
l   n                e          o
e   o                           d
     t                           e
     a                           m
     n                           o
     g                           m
                                  a
                                  n
                                  i
                                  a
Searching would be real fast, as fast as just going down a bunch of arrays until you hit the end. Though with 30,000+ names, I figure this would get very large in memory and probably take a long time to load into arrays. I also figure it could get a bit complex to code cause of how I search through each keyword in a name.

Connors


  • ^ This guy is amazing.

  • ****


  • Posts: 2374

  • It's a secret to everyone...
Re: Suppose you have 30,000+ names in a database....
« Reply #3 on: December 25, 2011, 11:43:45 PM »
I don't know that I could add much. I've only touched on the binary search, but it is a fast way. Interesting ideas.
Warning: The above post may have been modified multiple times.

"In a great game, the character must never perfectly obey the user's command"
 - Tim Rogers

http://connorspuzzles.tumblr.com/

Gan


  • Administrator

  • ^ This guy is amazing.

  • *****


  • Posts: 4411
Re: Suppose you have 30,000+ names in a database....
« Reply #4 on: December 25, 2011, 11:59:44 PM »
Binary search is incredibly fast, but it may not be the best method regarding the criteria of the search.

I think I've found the best solution. I created a tree of words. It's a whole bunch of arrays within arrays. Each array stands for a single letter at a certain position in a word. To retrieve all results related to "Fred", it would have to only go down an array for each letter. The array for F, the array for R that's within F, the array for E within R, the array for D within E. Then it just returns the results that the D array is carrying.
It's intensely fast, though it's memory usage is substantially higher, as well as the loading time. If you can live with longer loading and memory usage, this method is the optimum choice. Instant return of results.

Though I needed to be able to search multiple keywords and get only results related to every keyword. This is where I used a binary search method. I made a result array and took all the results from each keyword and started with the smallest list of keyword results. Then I compared that to every other array of results by using a binary search instead of just incrementing. This sped up the sorting intensely.

Thus, I have a search algorithm that can traverse 1 million names and compare multiple keywords and retrieve results instantly on a machine with a lesser processor: an iPad.


I would post the code but I wrote it at work. :/

Connors


  • ^ This guy is amazing.

  • ****


  • Posts: 2374

  • It's a secret to everyone...
Re: Suppose you have 30,000+ names in a database....
« Reply #5 on: December 26, 2011, 12:09:13 AM »
That's really cool, in a way it's similar to the binary search, as it narrows down the possibilities, one letter at a time the same way you might look through a dictionary. This is starting to remind me of that 20 Questions game that guesses what you're thinking of, it has to narrow down possibilities from a long list. It would look like a pyramid. I believe they set up a website that records people's words and gets smarter at the game all the time.
Warning: The above post may have been modified multiple times.

"In a great game, the character must never perfectly obey the user's command"
 - Tim Rogers

http://connorspuzzles.tumblr.com/

Gan


  • Administrator

  • ^ This guy is amazing.

  • *****


  • Posts: 4411
Re: Suppose you have 30,000+ names in a database....
« Reply #6 on: January 11, 2012, 05:00:54 PM »
I've been working on this problem more.

Having a keyword search Trie allows blazingly fast keyword searches. But! Incredibly slow loading time and incredibly high memory useage.

Having a normal keyword string comparison allows super slow keyword search. But! Incredibly fast loading time and incredibly low memory useage.

I've discovered there is a ratio between the two methods
Search Speed : Loading Speed & Memory Useage


Now here is some benchmarks from testing on an iPhone using 1 million keywords:
Keyword String Comparison:
Loading: 0 sec
Memory Useage: Less than 10mb
Search Time: 30 seconds to1 minute

Keyword Search Trie:
Loading: 30 seconds
Memory Useage: 150-200mb
Search Time: Instant

There's a definite tradeoff between the two. The first is plenty fast enough on a normal computer but not on an iPhone.
The second works well on an iPhone but isn't good in loading ot memory useage.


So my idea is to mix the two and get a good balanced ratio!

If I put a cap on the size of the search trie it means that depending on the size of the cap, the ratio will vary.
A cap of 0 would make the search into a plain keyword comparison. A cap of infinite would make the search into a Trie search.

Here's some data.

I have 1,200,000 keywords.

Here is the number of string comparisons I'd have to do depending on cap size:

0 - 1,200,000 comparisons - 30 sec to 1 min - Loading time 0 sec - <10mb memory useage
1 - 34,000 comparisons - 5-10 seconds
2 - 980 comparisons - 1/3 of a second
3 - 28 comparisons - Instant

Infinite - 0 comparisons - Instant - Loadin time 30 sec - 200mb memory useage


Tomorrow I'm going to attempt this idea. I feel if I can cap the Trie at 2 or 3 it's be sufficiently fast both loading and searching yet super slim in memory useage.

Gan


  • Administrator

  • ^ This guy is amazing.

  • *****


  • Posts: 4411
Re: Suppose you have 30,000+ names in a database....
« Reply #7 on: January 12, 2012, 03:00:10 PM »
It works! :D

Loading time is like 1/4 of what it use to be. Memory usage is 1/2 to 1/4 slimmer as well. Plus searching is still fast enough that you don't see hiccups!

Ideally, the user will never have to wait for it to load and never have wait for the search results.

GMG Tim


  • Administrator

  • GMG-er

  • *****


  • Posts: 465
Re: Suppose you have 30,000+ names in a database....
« Reply #8 on: January 12, 2012, 03:14:33 PM »
And you have to load them and display them when a person searches through the database...
You have to make it load and search as fast as possible, what method would you use?

The database looks like this:
John Edward III
Prince Henry The Awesome
Gan Is Made Of Epic Sauce
MyParents .Hate Me
Blablabla GaGa RaRa Oh LaLa

And there's 30,000+ names, each name on a separate line.

Search specifications:
Every word typed in the search box has to be partially matched with every word of every name.
If this is in the search box: "Jo Ed"
It will display this: "John Edward III"
If this is in the search box: "E"
It will display this:
"John Edward III
Gan Is Made Of Epic Sauce"

My first attempt at making this involved loading each name in an array and when searching, do string comparisons to every word in every name.
Loading the list of names was fast. Searching through them was as slow as me doing math homework.

My second attempt involved having 26 arrays, each for a letter of the alphabet. I would chop up the names and place each name keyword in an array that symbolized the first letter in the keyword.
Loading was fast, and searching was fairly decent.

My latest attempt involved me trying to put to use some of my knowledge from my college courses. I decided to chop all names into keywords, each keyword was linked to an array that connected to every name that contained that keyword, and I sorted all keywords into an array using a binary sort so I could quickly and efficiently search through by using a binary search algorithm.
Well, loading was slow(4 seconds), but searching wasn't too bad. Still a noticeable hiccup.

My idea, is to use the same method as attempt 3. Except I make a keyword not link to multiple names and have multiple of the same keywords except they each link to a unique name. Then I use binary sort to sort them, and binary search to pull them out. I have a feeling this might be faster in both loading and retrieving.

What methods would you guys use?
(BTW I'm tryin to save on ram and if searching is super fast, I could have loading done asynchronously in the background)

Use an SQL database.

Gan


  • Administrator

  • ^ This guy is amazing.

  • *****


  • Posts: 4411
Re: Suppose you have 30,000+ names in a database....
« Reply #9 on: January 12, 2012, 03:23:45 PM »
I might research into that...

How optimized is an SQL database when it comes to search? How much ram and disk space does it take? What about loading speed for 1,200,000 keywords and 200,000 medications?

My algorithm seems to get the job done blazingly fast on the iPhone.

There's only one problem. It takes 10 seconds to load in a background thread. Most people wouldn't ever see the loading progress but if someone quickly logged in and tried to search, they'd most definitely see the end of the database loading.

Will an SQL database search blazingly fast but have a small memory footprint and load quick enough to be unseen?

GMG Tim


  • Administrator

  • GMG-er

  • *****


  • Posts: 465
Re: Suppose you have 30,000+ names in a database....
« Reply #10 on: January 13, 2012, 03:13:56 AM »
I might research into that...

How optimized is an SQL database when it comes to search? How much ram and disk space does it take? What about loading speed for 1,200,000 keywords and 200,000 medications?

My algorithm seems to get the job done blazingly fast on the iPhone.

There's only one problem. It takes 10 seconds to load in a background thread. Most people wouldn't ever see the loading progress but if someone quickly logged in and tried to search, they'd most definitely see the end of the database loading.

Will an SQL database search blazingly fast but have a small memory footprint and load quick enough to be unseen?

I'm sure Apple implements something similar to SQLite in Objective-C. SQL is highly optimized and can retrieve queries in a fraction of a second. An SQLite database will have a small footprint for 1,200,000 keywords but can grow substantially when you get into millions of large entries (e.g. full-text news articles).

Gan


  • Administrator

  • ^ This guy is amazing.

  • *****


  • Posts: 4411
Re: Suppose you have 30,000+ names in a database....
« Reply #11 on: January 13, 2012, 03:58:45 AM »
Hmm. I think I'll research that.

It'll take some time, figure out how to get SQL to do string comparisons on each seperate word in a search result...
Seems like it'd be a bit complex.